Re: WARNING in try_charge
On 2018/08/10 0:07, Michal Hocko wrote: > On Thu 09-08-18 22:57:43, Tetsuo Handa wrote: >> >From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 >> From: Tetsuo Handa >> Date: Thu, 9 Aug 2018 22:49:40 +0900 >> Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until >> memory reserve fails >> >> Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip >> oom_reaped tasks") changed to select next OOM victim as soon as >> MMF_OOM_SKIP is set. But we don't need to select next OOM victim as >> long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) >> caused by this race window [1]. > > It is not because the syzbot was exercising a completely different code > path (memcg charge rather than the page allocator). I know syzbot is hitting memcg charge path. > >> Since memcg OOM case uses forced charge if current thread is killed, >> out_of_memory() can return true without selecting next OOM victim. >> Therefore, this patch changes task_will_free_mem(current) to ignore >> MMF_OOM_SKIP unless ALLOC_OOM allocation failed. > > And the patch is simply wrong for memcg. > Why? I think I should have done -+ page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM -+ || (gfp_mask & __GFP_NOMEMALLOC), ac, -+ _some_progress); ++ page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM, ++ ac, _some_progress); because nobody will use __GFP_NOMEMALLOC | __GFP_NOFAIL. But for memcg charge path, task_will_free_mem(current, false) == true and out_of_memory() will return true, which avoids unnecessary OOM killing. Of course, this patch cannot avoid unnecessary OOM killing if out_of_memory() is called by not yet killed process. But to mitigate it, what can we do other than defer setting MMF_OOM_SKIP using a timeout based mechanism? Making the OOM reaper unconditionally reclaim all memory is not a valid answer.
Re: WARNING in try_charge
On 2018/08/10 0:07, Michal Hocko wrote: > On Thu 09-08-18 22:57:43, Tetsuo Handa wrote: >> >From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 >> From: Tetsuo Handa >> Date: Thu, 9 Aug 2018 22:49:40 +0900 >> Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until >> memory reserve fails >> >> Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip >> oom_reaped tasks") changed to select next OOM victim as soon as >> MMF_OOM_SKIP is set. But we don't need to select next OOM victim as >> long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) >> caused by this race window [1]. > > It is not because the syzbot was exercising a completely different code > path (memcg charge rather than the page allocator). I know syzbot is hitting memcg charge path. > >> Since memcg OOM case uses forced charge if current thread is killed, >> out_of_memory() can return true without selecting next OOM victim. >> Therefore, this patch changes task_will_free_mem(current) to ignore >> MMF_OOM_SKIP unless ALLOC_OOM allocation failed. > > And the patch is simply wrong for memcg. > Why? I think I should have done -+ page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM -+ || (gfp_mask & __GFP_NOMEMALLOC), ac, -+ _some_progress); ++ page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM, ++ ac, _some_progress); because nobody will use __GFP_NOMEMALLOC | __GFP_NOFAIL. But for memcg charge path, task_will_free_mem(current, false) == true and out_of_memory() will return true, which avoids unnecessary OOM killing. Of course, this patch cannot avoid unnecessary OOM killing if out_of_memory() is called by not yet killed process. But to mitigate it, what can we do other than defer setting MMF_OOM_SKIP using a timeout based mechanism? Making the OOM reaper unconditionally reclaim all memory is not a valid answer.
Re: WARNING in try_charge
On Thu, Aug 09, 2018 at 10:57:43PM +0900, Tetsuo Handa wrote: > From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Thu, 9 Aug 2018 22:49:40 +0900 > Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until > memory reserve fails > > Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip > oom_reaped tasks") changed to select next OOM victim as soon as > MMF_OOM_SKIP is set. But we don't need to select next OOM victim as > long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) > caused by this race window [1]. Huh? That's the memcg path, it has nothing to do with ALLOC_OOM. > Since memcg OOM case uses forced charge if current thread is killed, > out_of_memory() can return true without selecting next OOM victim. > Therefore, this patch changes task_will_free_mem(current) to ignore > MMF_OOM_SKIP unless ALLOC_OOM allocation failed. I have no idea how the first and the second half of this paragraph go together. They're completely independent code paths.
Re: WARNING in try_charge
On Thu, Aug 09, 2018 at 10:57:43PM +0900, Tetsuo Handa wrote: > From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Thu, 9 Aug 2018 22:49:40 +0900 > Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until > memory reserve fails > > Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip > oom_reaped tasks") changed to select next OOM victim as soon as > MMF_OOM_SKIP is set. But we don't need to select next OOM victim as > long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) > caused by this race window [1]. Huh? That's the memcg path, it has nothing to do with ALLOC_OOM. > Since memcg OOM case uses forced charge if current thread is killed, > out_of_memory() can return true without selecting next OOM victim. > Therefore, this patch changes task_will_free_mem(current) to ignore > MMF_OOM_SKIP unless ALLOC_OOM allocation failed. I have no idea how the first and the second half of this paragraph go together. They're completely independent code paths.
Re: WARNING in try_charge
On Thu 09-08-18 22:57:43, Tetsuo Handa wrote: > >From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Thu, 9 Aug 2018 22:49:40 +0900 > Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until > memory reserve fails > > Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip > oom_reaped tasks") changed to select next OOM victim as soon as > MMF_OOM_SKIP is set. But we don't need to select next OOM victim as > long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) > caused by this race window [1]. It is not because the syzbot was exercising a completely different code path (memcg charge rather than the page allocator). > Since memcg OOM case uses forced charge if current thread is killed, > out_of_memory() can return true without selecting next OOM victim. > Therefore, this patch changes task_will_free_mem(current) to ignore > MMF_OOM_SKIP unless ALLOC_OOM allocation failed. And the patch is simply wrong for memcg. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Thu 09-08-18 22:57:43, Tetsuo Handa wrote: > >From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Thu, 9 Aug 2018 22:49:40 +0900 > Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until > memory reserve fails > > Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip > oom_reaped tasks") changed to select next OOM victim as soon as > MMF_OOM_SKIP is set. But we don't need to select next OOM victim as > long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) > caused by this race window [1]. It is not because the syzbot was exercising a completely different code path (memcg charge rather than the page allocator). > Since memcg OOM case uses forced charge if current thread is killed, > out_of_memory() can return true without selecting next OOM victim. > Therefore, this patch changes task_will_free_mem(current) to ignore > MMF_OOM_SKIP unless ALLOC_OOM allocation failed. And the patch is simply wrong for memcg. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
>From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Thu, 9 Aug 2018 22:49:40 +0900 Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until memory reserve fails Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip oom_reaped tasks") changed to select next OOM victim as soon as MMF_OOM_SKIP is set. But we don't need to select next OOM victim as long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) caused by this race window [1]. Since memcg OOM case uses forced charge if current thread is killed, out_of_memory() can return true without selecting next OOM victim. Therefore, this patch changes task_will_free_mem(current) to ignore MMF_OOM_SKIP unless ALLOC_OOM allocation failed. [1] https://syzkaller.appspot.com/bug?id=ea8c7912757d253537375e981b61749b2da69258 Signed-off-by: Tetsuo Handa Reported-by: syzbot Cc: Michal Hocko Cc: Oleg Nesterov Cc: Vladimir Davydov Cc: David Rientjes --- include/linux/oom.h | 3 +++ mm/oom_kill.c | 8 mm/page_alloc.c | 7 +-- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 69864a5..b5abacd 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -38,6 +38,9 @@ struct oom_control { */ const int order; + /* Did we already try ALLOC_OOM allocation? i*/ + const bool reserve_tried; + /* Used by oom implementation, do not set */ unsigned long totalpages; struct task_struct *chosen; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0e10b86..95453e8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -782,7 +782,7 @@ static inline bool __task_will_free_mem(struct task_struct *task) * Caller has to make sure that task->mm is stable (hold task_lock or * it operates on the current). */ -static bool task_will_free_mem(struct task_struct *task) +static bool task_will_free_mem(struct task_struct *task, bool select_new) { struct mm_struct *mm = task->mm; struct task_struct *p; @@ -803,7 +803,7 @@ static bool task_will_free_mem(struct task_struct *task) * This task has already been drained by the oom reaper so there are * only small chances it will free some more */ - if (test_bit(MMF_OOM_SKIP, >flags)) + if (test_bit(MMF_OOM_SKIP, >flags) && select_new) return false; if (atomic_read(>mm_users) <= 1) @@ -939,7 +939,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message) * so it can die quickly */ task_lock(p); - if (task_will_free_mem(p)) { + if (task_will_free_mem(p, true)) { mark_oom_victim(p); wake_oom_reaper(p); task_unlock(p); @@ -1069,7 +1069,7 @@ bool out_of_memory(struct oom_control *oc) * select it. The goal is to allow it to allocate so that it may * quickly exit and free its memory. */ - if (task_will_free_mem(current)) { + if (task_will_free_mem(current, oc->reserve_tried)) { mark_oom_victim(current); wake_oom_reaper(current); return true; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 879b861..03ca29a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3455,7 +3455,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) } static inline struct page * -__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, +__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, bool reserve_tried, const struct alloc_context *ac, unsigned long *did_some_progress) { struct oom_control oc = { @@ -3464,6 +3464,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) .memcg = NULL, .gfp_mask = gfp_mask, .order = order, + .reserve_tried = reserve_tried, }; struct page *page; @@ -4239,7 +4240,9 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) goto retry_cpuset; /* Reclaim has failed us, start killing things */ - page = __alloc_pages_may_oom(gfp_mask, order, ac, _some_progress); + page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM +|| (gfp_mask & __GFP_NOMEMALLOC), ac, +_some_progress); if (page) goto got_pg; -- 1.8.3.1
Re: WARNING in try_charge
>From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Thu, 9 Aug 2018 22:49:40 +0900 Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until memory reserve fails Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip oom_reaped tasks") changed to select next OOM victim as soon as MMF_OOM_SKIP is set. But we don't need to select next OOM victim as long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1) caused by this race window [1]. Since memcg OOM case uses forced charge if current thread is killed, out_of_memory() can return true without selecting next OOM victim. Therefore, this patch changes task_will_free_mem(current) to ignore MMF_OOM_SKIP unless ALLOC_OOM allocation failed. [1] https://syzkaller.appspot.com/bug?id=ea8c7912757d253537375e981b61749b2da69258 Signed-off-by: Tetsuo Handa Reported-by: syzbot Cc: Michal Hocko Cc: Oleg Nesterov Cc: Vladimir Davydov Cc: David Rientjes --- include/linux/oom.h | 3 +++ mm/oom_kill.c | 8 mm/page_alloc.c | 7 +-- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 69864a5..b5abacd 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -38,6 +38,9 @@ struct oom_control { */ const int order; + /* Did we already try ALLOC_OOM allocation? i*/ + const bool reserve_tried; + /* Used by oom implementation, do not set */ unsigned long totalpages; struct task_struct *chosen; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 0e10b86..95453e8 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -782,7 +782,7 @@ static inline bool __task_will_free_mem(struct task_struct *task) * Caller has to make sure that task->mm is stable (hold task_lock or * it operates on the current). */ -static bool task_will_free_mem(struct task_struct *task) +static bool task_will_free_mem(struct task_struct *task, bool select_new) { struct mm_struct *mm = task->mm; struct task_struct *p; @@ -803,7 +803,7 @@ static bool task_will_free_mem(struct task_struct *task) * This task has already been drained by the oom reaper so there are * only small chances it will free some more */ - if (test_bit(MMF_OOM_SKIP, >flags)) + if (test_bit(MMF_OOM_SKIP, >flags) && select_new) return false; if (atomic_read(>mm_users) <= 1) @@ -939,7 +939,7 @@ static void oom_kill_process(struct oom_control *oc, const char *message) * so it can die quickly */ task_lock(p); - if (task_will_free_mem(p)) { + if (task_will_free_mem(p, true)) { mark_oom_victim(p); wake_oom_reaper(p); task_unlock(p); @@ -1069,7 +1069,7 @@ bool out_of_memory(struct oom_control *oc) * select it. The goal is to allow it to allocate so that it may * quickly exit and free its memory. */ - if (task_will_free_mem(current)) { + if (task_will_free_mem(current, oc->reserve_tried)) { mark_oom_victim(current); wake_oom_reaper(current); return true; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 879b861..03ca29a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3455,7 +3455,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) } static inline struct page * -__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, +__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, bool reserve_tried, const struct alloc_context *ac, unsigned long *did_some_progress) { struct oom_control oc = { @@ -3464,6 +3464,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, const char *fmt, ...) .memcg = NULL, .gfp_mask = gfp_mask, .order = order, + .reserve_tried = reserve_tried, }; struct page *page; @@ -4239,7 +4240,9 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask) goto retry_cpuset; /* Reclaim has failed us, start killing things */ - page = __alloc_pages_may_oom(gfp_mask, order, ac, _some_progress); + page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM +|| (gfp_mask & __GFP_NOMEMALLOC), ac, +_some_progress); if (page) goto got_pg; -- 1.8.3.1
Re: WARNING in try_charge
On Tue 07-08-18 13:18:00, Dmitry Vyukov wrote: [...] > Great we are making progress here! > > So if it's something to fix in kernel we just leave WARN alone. It > served its intended purpose of notifying kernel developers about > something to fix in kernel. Yes, agreed! And the way how your syzbot automation works made it so much easier so thanks a lot! The patch has been posted http://lkml.kernel.org/r/20180807072553.14941-1-mho...@kernel.org -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Tue 07-08-18 13:18:00, Dmitry Vyukov wrote: [...] > Great we are making progress here! > > So if it's something to fix in kernel we just leave WARN alone. It > served its intended purpose of notifying kernel developers about > something to fix in kernel. Yes, agreed! And the way how your syzbot automation works made it so much easier so thanks a lot! The patch has been posted http://lkml.kernel.org/r/20180807072553.14941-1-mho...@kernel.org -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 8:55 PM, Michal Hocko wrote: > The debugging patch was wrong but I guess I see it finally. > It's a race > > : [ 72.901666] Memory cgroup out of memory: Kill process 6584 > (syz-executor1) score 55 or sacrifice child > : [ 72.917037] Killed process 6584 (syz-executor1) total-vm:37704kB, > anon-rss:2140kB, file-rss:0kB, shmem-rss:0kB > : [ 72.927256] task=syz-executor5 pid=6581 charge bypass > : [ 72.928046] oom_reaper: reaped process 6584 (syz-executor1), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 72.932818] task=syz-executor6 pid=6576 invoked memcg oom killer. > oom_victim=1 > : [ 72.942790] task=syz-executor5 pid=6581 charge for nr_pages=1 > : [ 72.949769] syz-executor6 invoked oom-killer: > gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, > oom_score_adj=0 > : [ 72.955606] task=syz-executor5 pid=6581 charge bypass > : [ 72.967394] syz-executor6 cpuset=/ mems_allowed=0 > : [ 72.973175] task=syz-executor5 pid=6581 charge for nr_pages=1 > : [...] > : [ 73.534865] Task in /ile0 killed as a result of limit of /ile0 > : [ 73.540865] memory: usage 76kB, limit 0kB, failcnt 260 > : [ 73.546142] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 73.552898] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 73.559051] Memory cgroup stats for /ile0: cache:0KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > : [ 73.578533] Tasks state (memory values in pages): > : [ 73.583404] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 73.592277] [ 6569] 0 6562 94271532480 > 0 syz-executor0 > : [ 73.601299] [ 6576] 0 6576 94260614400 > 0 syz-executor6 > : [ 73.610333] [ 6578] 0 6578 9426 534614400 > 0 syz-executor4 > : [ 73.619381] [ 6579] 0 6579 94260573440 > 0 syz-executor5 > : [ 73.628414] [ 6582] 0 6582 94260614400 > 0 syz-executor7 > : [ 73.637441] [ 6584] 0 6584 94260573440 > 0 syz-executor1 > : [ 73.646464] Memory cgroup out of memory: Kill process 6578 > (syz-executor4) score 549000 or sacrifice child > : [ 73.656295] task=syz-executor6 pid=6576 is oom victim now > > This should be 6578 but we at least know that we are running in 6576 > context so the we are setting the state from a remote context which > itself has been killed already > > : [ 73.661841] Killed process 6578 (syz-executor4) total-vm:37704kB, > anon-rss:2136kB, file-rss:0kB, shmem-rss:0kB > : [ 73.672035] task=syz-executor6 pid=6576 charge bypass > : [ 73.672801] oom_reaper: reaped process 6578 (syz-executor4), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 73.678829] task=syz-executor4 pid=6578 invoked memcg oom killer. > oom_victim=1 > > and here the victim finally reached the oom path finally. > > : [ 73.687453] task=syz-executor6 pid=6576 charge for nr_pages=1 > : [ 73.694534] [ cut here ] > : [ 73.700424] task=syz-executor6 pid=6576 charge bypass > : [ 73.705175] Memory cgroup charge failed because of no reclaimable > memory! This looks like a misconfiguration or a kernel bug. > : [ 73.705321] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 > try_charge+0xafa/0x1710 > > But there is nobody killable. So the oom kill happened _after_ our force > charge path. Therefore we should do the following regardless whether we > make tis warn or pr_$foo Great we are making progress here! So if it's something to fix in kernel we just leave WARN alone. It served its intended purpose of notifying kernel developers about something to fix in kernel. And as you noted 0 is not actually special in this context anyway. I misunderstood how exactly misconfiguration is involved here. > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > 116b181bb646afedd770985de20a68721bdb2648 > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..1b6eed1bc404 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup > *memcg, gfp_t mask, int > return OOM_ASYNC; > } > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > + tsk_is_oom_victim(current)) > return OOM_SUCCESS; > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! > " > -- > Michal Hocko > SUSE Labs
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 8:55 PM, Michal Hocko wrote: > The debugging patch was wrong but I guess I see it finally. > It's a race > > : [ 72.901666] Memory cgroup out of memory: Kill process 6584 > (syz-executor1) score 55 or sacrifice child > : [ 72.917037] Killed process 6584 (syz-executor1) total-vm:37704kB, > anon-rss:2140kB, file-rss:0kB, shmem-rss:0kB > : [ 72.927256] task=syz-executor5 pid=6581 charge bypass > : [ 72.928046] oom_reaper: reaped process 6584 (syz-executor1), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 72.932818] task=syz-executor6 pid=6576 invoked memcg oom killer. > oom_victim=1 > : [ 72.942790] task=syz-executor5 pid=6581 charge for nr_pages=1 > : [ 72.949769] syz-executor6 invoked oom-killer: > gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, > oom_score_adj=0 > : [ 72.955606] task=syz-executor5 pid=6581 charge bypass > : [ 72.967394] syz-executor6 cpuset=/ mems_allowed=0 > : [ 72.973175] task=syz-executor5 pid=6581 charge for nr_pages=1 > : [...] > : [ 73.534865] Task in /ile0 killed as a result of limit of /ile0 > : [ 73.540865] memory: usage 76kB, limit 0kB, failcnt 260 > : [ 73.546142] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 73.552898] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 73.559051] Memory cgroup stats for /ile0: cache:0KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > : [ 73.578533] Tasks state (memory values in pages): > : [ 73.583404] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 73.592277] [ 6569] 0 6562 94271532480 > 0 syz-executor0 > : [ 73.601299] [ 6576] 0 6576 94260614400 > 0 syz-executor6 > : [ 73.610333] [ 6578] 0 6578 9426 534614400 > 0 syz-executor4 > : [ 73.619381] [ 6579] 0 6579 94260573440 > 0 syz-executor5 > : [ 73.628414] [ 6582] 0 6582 94260614400 > 0 syz-executor7 > : [ 73.637441] [ 6584] 0 6584 94260573440 > 0 syz-executor1 > : [ 73.646464] Memory cgroup out of memory: Kill process 6578 > (syz-executor4) score 549000 or sacrifice child > : [ 73.656295] task=syz-executor6 pid=6576 is oom victim now > > This should be 6578 but we at least know that we are running in 6576 > context so the we are setting the state from a remote context which > itself has been killed already > > : [ 73.661841] Killed process 6578 (syz-executor4) total-vm:37704kB, > anon-rss:2136kB, file-rss:0kB, shmem-rss:0kB > : [ 73.672035] task=syz-executor6 pid=6576 charge bypass > : [ 73.672801] oom_reaper: reaped process 6578 (syz-executor4), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 73.678829] task=syz-executor4 pid=6578 invoked memcg oom killer. > oom_victim=1 > > and here the victim finally reached the oom path finally. > > : [ 73.687453] task=syz-executor6 pid=6576 charge for nr_pages=1 > : [ 73.694534] [ cut here ] > : [ 73.700424] task=syz-executor6 pid=6576 charge bypass > : [ 73.705175] Memory cgroup charge failed because of no reclaimable > memory! This looks like a misconfiguration or a kernel bug. > : [ 73.705321] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 > try_charge+0xafa/0x1710 > > But there is nobody killable. So the oom kill happened _after_ our force > charge path. Therefore we should do the following regardless whether we > make tis warn or pr_$foo Great we are making progress here! So if it's something to fix in kernel we just leave WARN alone. It served its intended purpose of notifying kernel developers about something to fix in kernel. And as you noted 0 is not actually special in this context anyway. I misunderstood how exactly misconfiguration is involved here. > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > 116b181bb646afedd770985de20a68721bdb2648 > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..1b6eed1bc404 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup > *memcg, gfp_t mask, int > return OOM_ASYNC; > } > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > + tsk_is_oom_victim(current)) > return OOM_SUCCESS; > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! > " > -- > Michal Hocko > SUSE Labs
Re: WARNING in try_charge
On 2018/08/07 6:50, Tetsuo Handa wrote: > list_for_each_entry_safe(p, tmp, _victim_list, oom_victim_list) { > struct mm_struct *mm = p->signal->oom_mm; > > if (oom_unkillable_task(p, oc->memcg, oc->nodemask)) > continue; > ret = true; > + /* > + * Since memcg OOM allows forced charge, we can safely wait > + * until __mmput() completes. > + */ > + if (is_memcg_oom(oc)) > + return true; Oops. If this OOM victim was blocked on some lock which current thread is holding, waiting forever unconditionally is not safe. > #ifdef CONFIG_MMU > /* >* Since the OOM reaper exists, we can safely wait until >* MMF_OOM_SKIP is set. >*/ > if (!test_bit(MMF_OOM_SKIP, >flags)) { > if (!oom_reap_target) { > get_task_struct(p); > oom_reap_target = p; > trace_wake_reaper(p->pid); > wake_up(_reaper_wait); > } > #endif > continue; > } > #endif > /* We can wait as long as OOM score is decreasing over time. */ > if (!victim_mm_stalling(p, mm)) > continue; > gaveup = true; > list_del(>oom_victim_list); > /* Drop a reference taken by mark_oom_victim(). */ > put_task_struct(p); > } > if (gaveup) > debug_show_all_locks(); > > return ret; > } >
Re: WARNING in try_charge
On 2018/08/07 6:50, Tetsuo Handa wrote: > list_for_each_entry_safe(p, tmp, _victim_list, oom_victim_list) { > struct mm_struct *mm = p->signal->oom_mm; > > if (oom_unkillable_task(p, oc->memcg, oc->nodemask)) > continue; > ret = true; > + /* > + * Since memcg OOM allows forced charge, we can safely wait > + * until __mmput() completes. > + */ > + if (is_memcg_oom(oc)) > + return true; Oops. If this OOM victim was blocked on some lock which current thread is holding, waiting forever unconditionally is not safe. > #ifdef CONFIG_MMU > /* >* Since the OOM reaper exists, we can safely wait until >* MMF_OOM_SKIP is set. >*/ > if (!test_bit(MMF_OOM_SKIP, >flags)) { > if (!oom_reap_target) { > get_task_struct(p); > oom_reap_target = p; > trace_wake_reaper(p->pid); > wake_up(_reaper_wait); > } > #endif > continue; > } > #endif > /* We can wait as long as OOM score is decreasing over time. */ > if (!victim_mm_stalling(p, mm)) > continue; > gaveup = true; > list_del(>oom_victim_list); > /* Drop a reference taken by mark_oom_victim(). */ > put_task_struct(p); > } > if (gaveup) > debug_show_all_locks(); > > return ret; > } >
Re: WARNING in try_charge
On 2018/08/07 5:55, Michal Hocko wrote: > On Tue 07-08-18 05:46:04, Tetsuo Handa wrote: >> On 2018/08/07 5:34, Michal Hocko wrote: >>> On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: On 2018/08/07 2:56, Michal Hocko wrote: > So the oom victim indeed passed the above force path after the oom > invocation. But later on hit the page fault path and that behaved > differently and for some reason the force path hasn't triggered. I am > wondering how could we hit the page fault path in the first place. The > task is already killed! So what the hell is going on here. > > I must be missing something obvious here. > YOU ARE OBVIOUSLY MISSING MY MAIL! I already said this is "mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once." problem which you are refusing at https://www.spinics.net/lists/linux-mm/msg133774.html . And you again ignored my mail. Very sad... >>> >>> Your suggestion simply didn't make much sense. There is nothing like >>> first check is different from the rest. >>> >> >> I don't think your patch is appropriate. It avoids hitting WARN(1) but does >> not avoid >> unnecessary killing of OOM victims. >> >> If you look at >> https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will >> notice that both 23766 and 23767 are killed due to >> task_will_free_mem(current) == false. >> This is "unnecessary killing of additional processes". > > Have you noticed the mere detail that the memcg has to kill any task > attempting the charge because the hard limit is 0? There is simply no > other way around. You cannot charge. There is no unnecessary killing. > Full stop. We do allow temporary breach of the hard limit just to let > the task die and uncharge on the way out. > select_bad_process() is called just because task_will_free_mem("already killed current thread which has not completed __mmput()") == false is a bug. I'm saying that the OOM killer should not give up as soon as MMF_OOM_SKIP is set. static bool oom_has_pending_victims(struct oom_control *oc) { struct task_struct *p, *tmp; bool ret = false; bool gaveup = false; if (is_sysrq_oom(oc)) return false; /* * Wait for pending victims until __mmput() completes or stalled * too long. */ list_for_each_entry_safe(p, tmp, _victim_list, oom_victim_list) { struct mm_struct *mm = p->signal->oom_mm; if (oom_unkillable_task(p, oc->memcg, oc->nodemask)) continue; ret = true; + /* +* Since memcg OOM allows forced charge, we can safely wait +* until __mmput() completes. +*/ + if (is_memcg_oom(oc)) + return true; #ifdef CONFIG_MMU /* * Since the OOM reaper exists, we can safely wait until * MMF_OOM_SKIP is set. */ if (!test_bit(MMF_OOM_SKIP, >flags)) { if (!oom_reap_target) { get_task_struct(p); oom_reap_target = p; trace_wake_reaper(p->pid); wake_up(_reaper_wait); } #endif continue; } #endif /* We can wait as long as OOM score is decreasing over time. */ if (!victim_mm_stalling(p, mm)) continue; gaveup = true; list_del(>oom_victim_list); /* Drop a reference taken by mark_oom_victim(). */ put_task_struct(p); } if (gaveup) debug_show_all_locks(); return ret; }
Re: WARNING in try_charge
On 2018/08/07 5:55, Michal Hocko wrote: > On Tue 07-08-18 05:46:04, Tetsuo Handa wrote: >> On 2018/08/07 5:34, Michal Hocko wrote: >>> On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: On 2018/08/07 2:56, Michal Hocko wrote: > So the oom victim indeed passed the above force path after the oom > invocation. But later on hit the page fault path and that behaved > differently and for some reason the force path hasn't triggered. I am > wondering how could we hit the page fault path in the first place. The > task is already killed! So what the hell is going on here. > > I must be missing something obvious here. > YOU ARE OBVIOUSLY MISSING MY MAIL! I already said this is "mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once." problem which you are refusing at https://www.spinics.net/lists/linux-mm/msg133774.html . And you again ignored my mail. Very sad... >>> >>> Your suggestion simply didn't make much sense. There is nothing like >>> first check is different from the rest. >>> >> >> I don't think your patch is appropriate. It avoids hitting WARN(1) but does >> not avoid >> unnecessary killing of OOM victims. >> >> If you look at >> https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will >> notice that both 23766 and 23767 are killed due to >> task_will_free_mem(current) == false. >> This is "unnecessary killing of additional processes". > > Have you noticed the mere detail that the memcg has to kill any task > attempting the charge because the hard limit is 0? There is simply no > other way around. You cannot charge. There is no unnecessary killing. > Full stop. We do allow temporary breach of the hard limit just to let > the task die and uncharge on the way out. > select_bad_process() is called just because task_will_free_mem("already killed current thread which has not completed __mmput()") == false is a bug. I'm saying that the OOM killer should not give up as soon as MMF_OOM_SKIP is set. static bool oom_has_pending_victims(struct oom_control *oc) { struct task_struct *p, *tmp; bool ret = false; bool gaveup = false; if (is_sysrq_oom(oc)) return false; /* * Wait for pending victims until __mmput() completes or stalled * too long. */ list_for_each_entry_safe(p, tmp, _victim_list, oom_victim_list) { struct mm_struct *mm = p->signal->oom_mm; if (oom_unkillable_task(p, oc->memcg, oc->nodemask)) continue; ret = true; + /* +* Since memcg OOM allows forced charge, we can safely wait +* until __mmput() completes. +*/ + if (is_memcg_oom(oc)) + return true; #ifdef CONFIG_MMU /* * Since the OOM reaper exists, we can safely wait until * MMF_OOM_SKIP is set. */ if (!test_bit(MMF_OOM_SKIP, >flags)) { if (!oom_reap_target) { get_task_struct(p); oom_reap_target = p; trace_wake_reaper(p->pid); wake_up(_reaper_wait); } #endif continue; } #endif /* We can wait as long as OOM score is decreasing over time. */ if (!victim_mm_stalling(p, mm)) continue; gaveup = true; list_del(>oom_victim_list); /* Drop a reference taken by mark_oom_victim(). */ put_task_struct(p); } if (gaveup) debug_show_all_locks(); return ret; }
Re: WARNING in try_charge
On Tue 07-08-18 05:46:04, Tetsuo Handa wrote: > On 2018/08/07 5:34, Michal Hocko wrote: > > On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: > >> On 2018/08/07 2:56, Michal Hocko wrote: > >>> So the oom victim indeed passed the above force path after the oom > >>> invocation. But later on hit the page fault path and that behaved > >>> differently and for some reason the force path hasn't triggered. I am > >>> wondering how could we hit the page fault path in the first place. The > >>> task is already killed! So what the hell is going on here. > >>> > >>> I must be missing something obvious here. > >>> > >> YOU ARE OBVIOUSLY MISSING MY MAIL! > >> > >> I already said this is "mm, oom: task_will_free_mem(current) should ignore > >> MMF_OOM_SKIP for once." > >> problem which you are refusing at > >> https://www.spinics.net/lists/linux-mm/msg133774.html . > >> And you again ignored my mail. Very sad... > > > > Your suggestion simply didn't make much sense. There is nothing like > > first check is different from the rest. > > > > I don't think your patch is appropriate. It avoids hitting WARN(1) but does > not avoid > unnecessary killing of OOM victims. > > If you look at > https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will > notice that both 23766 and 23767 are killed due to > task_will_free_mem(current) == false. > This is "unnecessary killing of additional processes". Have you noticed the mere detail that the memcg has to kill any task attempting the charge because the hard limit is 0? There is simply no other way around. You cannot charge. There is no unnecessary killing. Full stop. We do allow temporary breach of the hard limit just to let the task die and uncharge on the way out. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Tue 07-08-18 05:46:04, Tetsuo Handa wrote: > On 2018/08/07 5:34, Michal Hocko wrote: > > On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: > >> On 2018/08/07 2:56, Michal Hocko wrote: > >>> So the oom victim indeed passed the above force path after the oom > >>> invocation. But later on hit the page fault path and that behaved > >>> differently and for some reason the force path hasn't triggered. I am > >>> wondering how could we hit the page fault path in the first place. The > >>> task is already killed! So what the hell is going on here. > >>> > >>> I must be missing something obvious here. > >>> > >> YOU ARE OBVIOUSLY MISSING MY MAIL! > >> > >> I already said this is "mm, oom: task_will_free_mem(current) should ignore > >> MMF_OOM_SKIP for once." > >> problem which you are refusing at > >> https://www.spinics.net/lists/linux-mm/msg133774.html . > >> And you again ignored my mail. Very sad... > > > > Your suggestion simply didn't make much sense. There is nothing like > > first check is different from the rest. > > > > I don't think your patch is appropriate. It avoids hitting WARN(1) but does > not avoid > unnecessary killing of OOM victims. > > If you look at > https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will > notice that both 23766 and 23767 are killed due to > task_will_free_mem(current) == false. > This is "unnecessary killing of additional processes". Have you noticed the mere detail that the memcg has to kill any task attempting the charge because the hard limit is 0? There is simply no other way around. You cannot charge. There is no unnecessary killing. Full stop. We do allow temporary breach of the hard limit just to let the task die and uncharge on the way out. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On 2018/08/07 5:34, Michal Hocko wrote: > On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: >> On 2018/08/07 2:56, Michal Hocko wrote: >>> So the oom victim indeed passed the above force path after the oom >>> invocation. But later on hit the page fault path and that behaved >>> differently and for some reason the force path hasn't triggered. I am >>> wondering how could we hit the page fault path in the first place. The >>> task is already killed! So what the hell is going on here. >>> >>> I must be missing something obvious here. >>> >> YOU ARE OBVIOUSLY MISSING MY MAIL! >> >> I already said this is "mm, oom: task_will_free_mem(current) should ignore >> MMF_OOM_SKIP for once." >> problem which you are refusing at >> https://www.spinics.net/lists/linux-mm/msg133774.html . >> And you again ignored my mail. Very sad... > > Your suggestion simply didn't make much sense. There is nothing like > first check is different from the rest. > I don't think your patch is appropriate. It avoids hitting WARN(1) but does not avoid unnecessary killing of OOM victims. If you look at https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will notice that both 23766 and 23767 are killed due to task_will_free_mem(current) == false. This is "unnecessary killing of additional processes". [ 365.869417] syz-executor2 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), order=0, oom_score_adj=0 [ 365.878899] CPU: 0 PID: 23767 Comm: syz-executor2 Not tainted 4.18.0-rc6-next-20180725+ #18 (...snipped...) [ 366.487490] Tasks state (memory values in pages): [ 366.492349] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 366.501237] [ 23766] 0 2376617620 8221 1269760 0 syz-executor3 [ 366.510367] [ 23767] 0 2376717618 8218 1269760 0 syz-executor2 [ 366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) score 8252000 or sacrifice child [ 366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB [ 366.540456] oom_reaper: reaped process 23766 (syz-executor3), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB [ 366.550949] syz-executor3 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), order=0, oom_score_adj=0 [ 366.560374] CPU: 1 PID: 23766 Comm: syz-executor3 Not tainted 4.18.0-rc6-next-20180725+ #18 (...snipped...) [ 367.138136] Tasks state (memory values in pages): [ 367.142986] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 367.151889] [ 23766] 0 2376617620 8002 1269760 0 syz-executor3 [ 367.160946] [ 23767] 0 2376717618 8218 1269760 0 syz-executor2 [ 367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) score 8249000 or sacrifice child [ 367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB [ 367.192101] oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB [ 367.202986] [ cut here ] [ 367.207845] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. [ 367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680 [ 367.227540] Kernel panic - not syncing: panic_on_warn set ...
Re: WARNING in try_charge
On 2018/08/07 5:34, Michal Hocko wrote: > On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: >> On 2018/08/07 2:56, Michal Hocko wrote: >>> So the oom victim indeed passed the above force path after the oom >>> invocation. But later on hit the page fault path and that behaved >>> differently and for some reason the force path hasn't triggered. I am >>> wondering how could we hit the page fault path in the first place. The >>> task is already killed! So what the hell is going on here. >>> >>> I must be missing something obvious here. >>> >> YOU ARE OBVIOUSLY MISSING MY MAIL! >> >> I already said this is "mm, oom: task_will_free_mem(current) should ignore >> MMF_OOM_SKIP for once." >> problem which you are refusing at >> https://www.spinics.net/lists/linux-mm/msg133774.html . >> And you again ignored my mail. Very sad... > > Your suggestion simply didn't make much sense. There is nothing like > first check is different from the rest. > I don't think your patch is appropriate. It avoids hitting WARN(1) but does not avoid unnecessary killing of OOM victims. If you look at https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will notice that both 23766 and 23767 are killed due to task_will_free_mem(current) == false. This is "unnecessary killing of additional processes". [ 365.869417] syz-executor2 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), order=0, oom_score_adj=0 [ 365.878899] CPU: 0 PID: 23767 Comm: syz-executor2 Not tainted 4.18.0-rc6-next-20180725+ #18 (...snipped...) [ 366.487490] Tasks state (memory values in pages): [ 366.492349] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 366.501237] [ 23766] 0 2376617620 8221 1269760 0 syz-executor3 [ 366.510367] [ 23767] 0 2376717618 8218 1269760 0 syz-executor2 [ 366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) score 8252000 or sacrifice child [ 366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB [ 366.540456] oom_reaper: reaped process 23766 (syz-executor3), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB [ 366.550949] syz-executor3 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), order=0, oom_score_adj=0 [ 366.560374] CPU: 1 PID: 23766 Comm: syz-executor3 Not tainted 4.18.0-rc6-next-20180725+ #18 (...snipped...) [ 367.138136] Tasks state (memory values in pages): [ 367.142986] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 367.151889] [ 23766] 0 2376617620 8002 1269760 0 syz-executor3 [ 367.160946] [ 23767] 0 2376717618 8218 1269760 0 syz-executor2 [ 367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) score 8249000 or sacrifice child [ 367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB [ 367.192101] oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB [ 367.202986] [ cut here ] [ 367.207845] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. [ 367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680 [ 367.227540] Kernel panic - not syncing: panic_on_warn set ...
Re: WARNING in try_charge
On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: > On 2018/08/07 2:56, Michal Hocko wrote: > > So the oom victim indeed passed the above force path after the oom > > invocation. But later on hit the page fault path and that behaved > > differently and for some reason the force path hasn't triggered. I am > > wondering how could we hit the page fault path in the first place. The > > task is already killed! So what the hell is going on here. > > > > I must be missing something obvious here. > > > YOU ARE OBVIOUSLY MISSING MY MAIL! > > I already said this is "mm, oom: task_will_free_mem(current) should ignore > MMF_OOM_SKIP for once." > problem which you are refusing at > https://www.spinics.net/lists/linux-mm/msg133774.html . > And you again ignored my mail. Very sad... Your suggestion simply didn't make much sense. There is nothing like first check is different from the rest. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Tue 07-08-18 05:26:23, Tetsuo Handa wrote: > On 2018/08/07 2:56, Michal Hocko wrote: > > So the oom victim indeed passed the above force path after the oom > > invocation. But later on hit the page fault path and that behaved > > differently and for some reason the force path hasn't triggered. I am > > wondering how could we hit the page fault path in the first place. The > > task is already killed! So what the hell is going on here. > > > > I must be missing something obvious here. > > > YOU ARE OBVIOUSLY MISSING MY MAIL! > > I already said this is "mm, oom: task_will_free_mem(current) should ignore > MMF_OOM_SKIP for once." > problem which you are refusing at > https://www.spinics.net/lists/linux-mm/msg133774.html . > And you again ignored my mail. Very sad... Your suggestion simply didn't make much sense. There is nothing like first check is different from the rest. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On 2018/08/07 2:56, Michal Hocko wrote: > So the oom victim indeed passed the above force path after the oom > invocation. But later on hit the page fault path and that behaved > differently and for some reason the force path hasn't triggered. I am > wondering how could we hit the page fault path in the first place. The > task is already killed! So what the hell is going on here. > > I must be missing something obvious here. > YOU ARE OBVIOUSLY MISSING MY MAIL! I already said this is "mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once." problem which you are refusing at https://www.spinics.net/lists/linux-mm/msg133774.html . And you again ignored my mail. Very sad...
Re: WARNING in try_charge
On 2018/08/07 2:56, Michal Hocko wrote: > So the oom victim indeed passed the above force path after the oom > invocation. But later on hit the page fault path and that behaved > differently and for some reason the force path hasn't triggered. I am > wondering how could we hit the page fault path in the first place. The > task is already killed! So what the hell is going on here. > > I must be missing something obvious here. > YOU ARE OBVIOUSLY MISSING MY MAIL! I already said this is "mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once." problem which you are refusing at https://www.spinics.net/lists/linux-mm/msg133774.html . And you again ignored my mail. Very sad...
Re: WARNING in try_charge
On Mon 06-08-18 21:45:53, Michal Hocko wrote: > [CCing Greg - the email thread starts here > http://lkml.kernel.org/r/5e979605729c1...@google.com] now for real > > On Mon 06-08-18 12:12:02, syzbot wrote: > > Hello, > > > > syzbot has tested the proposed patch and the reproducer did not trigger > > crash: > > OK, this is reassuring. Btw Greg has pointed out this potential case > http://lkml.kernel.org/r/xr93in62jy8k@gthelen.svl.corp.google.com > but I simply didn't get what he meant. He was suggesting MMF_OOM_SKIP > but I didn't get why that matters. I didn't think about a race. > > So how about this patch: > From 74d980f8d066d06ada657ebf9b586dbf5668ed26 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Mon, 6 Aug 2018 21:21:24 +0200 > Subject: [PATCH] memcg, oom: be careful about races when warning about no > reclaimable task > > "memcg, oom: move out_of_memory back to the charge path" has added a > warning triggered when the oom killer cannot find any eligible task > and so there is no way to reclaim the oom memcg under its hard limit. > Further charges for such a memcg are forced and therefore the hard limit > isolation is weakened. > > The current warning is however too eager to trigger even when we are not > really hitting the above condition. Syzbot and Greg Thelen have noticed > that we can hit this condition even when there is still oom victim > pending. E.g. the following race is possible: > > memcg has two tasks taskA, taskB. > > CPU1 (taskA) CPU2CPU3 (taskB) > try_charge > mem_cgroup_out_of_memorytry_charge > select_bad_process(taskB) > oom_kill_processoom_reap_task > # No real memory reaped > > mem_cgroup_out_of_memory > # set taskB -> MMF_OOM_SKIP > # retry charge > mem_cgroup_out_of_memory > oom_lock oom_lock > select_bad_process(self) > oom_kill_process(self) > oom_unlock > # no eligible task > > In fact syzbot test triggered this situation by placing multiple tasks > into a memcg with hard limit set to 0. So no task really had any memory > charged to the memcg > > : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB > mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB > active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB > : Tasks state (memory values in pages): > : [ pid ] uid tgid total_vm rss pgtables_bytes swapents > oom_score_adj name > : [ 6569] 0 6562 94271532480 0 > syz-executor0 > : [ 6576] 0 6576 94260614400 0 > syz-executor6 > : [ 6578] 0 6578 9426 534614400 0 > syz-executor4 > : [ 6579] 0 6579 94260573440 0 > syz-executor5 > : [ 6582] 0 6582 94260614400 0 > syz-executor7 > : [ 6584] 0 6584 94260573440 0 > syz-executor1 > > so in principle there is indeed nothing reclaimable in this memcg and > this looks like a misconfiguration. On the other hand we can clearly > kill all those tasks so it is a bit early to warn and scare users. Do > that by checking that the current is the oom victim and bypass the > warning then. The victim is allowed to force charge and terminate to > release its temporal charge along the way. > > Fixes: "memcg, oom: move out_of_memory back to the charge path" > Noticed-by: Greg Thelen > Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com > Signed-off-by: Michal Hocko > --- > mm/memcontrol.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..1b6eed1bc404 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup > *memcg, gfp_t mask, int > return OOM_ASYNC; > } > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > + tsk_is_oom_victim(current)) > return OOM_SUCCESS; > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > -- > 2.18.0 > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 21:45:53, Michal Hocko wrote: > [CCing Greg - the email thread starts here > http://lkml.kernel.org/r/5e979605729c1...@google.com] now for real > > On Mon 06-08-18 12:12:02, syzbot wrote: > > Hello, > > > > syzbot has tested the proposed patch and the reproducer did not trigger > > crash: > > OK, this is reassuring. Btw Greg has pointed out this potential case > http://lkml.kernel.org/r/xr93in62jy8k@gthelen.svl.corp.google.com > but I simply didn't get what he meant. He was suggesting MMF_OOM_SKIP > but I didn't get why that matters. I didn't think about a race. > > So how about this patch: > From 74d980f8d066d06ada657ebf9b586dbf5668ed26 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Mon, 6 Aug 2018 21:21:24 +0200 > Subject: [PATCH] memcg, oom: be careful about races when warning about no > reclaimable task > > "memcg, oom: move out_of_memory back to the charge path" has added a > warning triggered when the oom killer cannot find any eligible task > and so there is no way to reclaim the oom memcg under its hard limit. > Further charges for such a memcg are forced and therefore the hard limit > isolation is weakened. > > The current warning is however too eager to trigger even when we are not > really hitting the above condition. Syzbot and Greg Thelen have noticed > that we can hit this condition even when there is still oom victim > pending. E.g. the following race is possible: > > memcg has two tasks taskA, taskB. > > CPU1 (taskA) CPU2CPU3 (taskB) > try_charge > mem_cgroup_out_of_memorytry_charge > select_bad_process(taskB) > oom_kill_processoom_reap_task > # No real memory reaped > > mem_cgroup_out_of_memory > # set taskB -> MMF_OOM_SKIP > # retry charge > mem_cgroup_out_of_memory > oom_lock oom_lock > select_bad_process(self) > oom_kill_process(self) > oom_unlock > # no eligible task > > In fact syzbot test triggered this situation by placing multiple tasks > into a memcg with hard limit set to 0. So no task really had any memory > charged to the memcg > > : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB > mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB > active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB > : Tasks state (memory values in pages): > : [ pid ] uid tgid total_vm rss pgtables_bytes swapents > oom_score_adj name > : [ 6569] 0 6562 94271532480 0 > syz-executor0 > : [ 6576] 0 6576 94260614400 0 > syz-executor6 > : [ 6578] 0 6578 9426 534614400 0 > syz-executor4 > : [ 6579] 0 6579 94260573440 0 > syz-executor5 > : [ 6582] 0 6582 94260614400 0 > syz-executor7 > : [ 6584] 0 6584 94260573440 0 > syz-executor1 > > so in principle there is indeed nothing reclaimable in this memcg and > this looks like a misconfiguration. On the other hand we can clearly > kill all those tasks so it is a bit early to warn and scare users. Do > that by checking that the current is the oom victim and bypass the > warning then. The victim is allowed to force charge and terminate to > release its temporal charge along the way. > > Fixes: "memcg, oom: move out_of_memory back to the charge path" > Noticed-by: Greg Thelen > Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com > Signed-off-by: Michal Hocko > --- > mm/memcontrol.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..1b6eed1bc404 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup > *memcg, gfp_t mask, int > return OOM_ASYNC; > } > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > + if (mem_cgroup_out_of_memory(memcg, mask, order) || > + tsk_is_oom_victim(current)) > return OOM_SUCCESS; > > WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " > -- > 2.18.0 > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
[CCing Greg - the email thread starts here http://lkml.kernel.org/r/5e979605729c1...@google.com] On Mon 06-08-18 12:12:02, syzbot wrote: > Hello, > > syzbot has tested the proposed patch and the reproducer did not trigger > crash: OK, this is reassuring. Btw Greg has pointed out this potential case http://lkml.kernel.org/r/xr93in62jy8k@gthelen.svl.corp.google.com but I simply didn't get what he meant. He was suggesting MMF_OOM_SKIP but I didn't get why that matters. I didn't think about a race. So how about this patch: >From 74d980f8d066d06ada657ebf9b586dbf5668ed26 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 6 Aug 2018 21:21:24 +0200 Subject: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task "memcg, oom: move out_of_memory back to the charge path" has added a warning triggered when the oom killer cannot find any eligible task and so there is no way to reclaim the oom memcg under its hard limit. Further charges for such a memcg are forced and therefore the hard limit isolation is weakened. The current warning is however too eager to trigger even when we are not really hitting the above condition. Syzbot and Greg Thelen have noticed that we can hit this condition even when there is still oom victim pending. E.g. the following race is possible: memcg has two tasks taskA, taskB. CPU1 (taskA)CPU2CPU3 (taskB) try_charge mem_cgroup_out_of_memory try_charge select_bad_process(taskB) oom_kill_process oom_reap_task # No real memory reaped mem_cgroup_out_of_memory # set taskB -> MMF_OOM_SKIP # retry charge mem_cgroup_out_of_memory oom_lockoom_lock select_bad_process(self) oom_kill_process(self) oom_unlock # no eligible task In fact syzbot test triggered this situation by placing multiple tasks into a memcg with hard limit set to 0. So no task really had any memory charged to the memcg : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : Tasks state (memory values in pages): : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 6569] 0 6562 94271532480 0 syz-executor0 : [ 6576] 0 6576 94260614400 0 syz-executor6 : [ 6578] 0 6578 9426 534614400 0 syz-executor4 : [ 6579] 0 6579 94260573440 0 syz-executor5 : [ 6582] 0 6582 94260614400 0 syz-executor7 : [ 6584] 0 6584 94260573440 0 syz-executor1 so in principle there is indeed nothing reclaimable in this memcg and this looks like a misconfiguration. On the other hand we can clearly kill all those tasks so it is a bit early to warn and scare users. Do that by checking that the current is the oom victim and bypass the warning then. The victim is allowed to force charge and terminate to release its temporal charge along the way. Fixes: "memcg, oom: move out_of_memory back to the charge path" Noticed-by: Greg Thelen Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com Signed-off-by: Michal Hocko --- mm/memcontrol.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..1b6eed1bc404 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int return OOM_ASYNC; } - if (mem_cgroup_out_of_memory(memcg, mask, order)) + if (mem_cgroup_out_of_memory(memcg, mask, order) || + tsk_is_oom_victim(current)) return OOM_SUCCESS; WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " -- 2.18.0 -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
[CCing Greg - the email thread starts here http://lkml.kernel.org/r/5e979605729c1...@google.com] On Mon 06-08-18 12:12:02, syzbot wrote: > Hello, > > syzbot has tested the proposed patch and the reproducer did not trigger > crash: OK, this is reassuring. Btw Greg has pointed out this potential case http://lkml.kernel.org/r/xr93in62jy8k@gthelen.svl.corp.google.com but I simply didn't get what he meant. He was suggesting MMF_OOM_SKIP but I didn't get why that matters. I didn't think about a race. So how about this patch: >From 74d980f8d066d06ada657ebf9b586dbf5668ed26 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Mon, 6 Aug 2018 21:21:24 +0200 Subject: [PATCH] memcg, oom: be careful about races when warning about no reclaimable task "memcg, oom: move out_of_memory back to the charge path" has added a warning triggered when the oom killer cannot find any eligible task and so there is no way to reclaim the oom memcg under its hard limit. Further charges for such a memcg are forced and therefore the hard limit isolation is weakened. The current warning is however too eager to trigger even when we are not really hitting the above condition. Syzbot and Greg Thelen have noticed that we can hit this condition even when there is still oom victim pending. E.g. the following race is possible: memcg has two tasks taskA, taskB. CPU1 (taskA)CPU2CPU3 (taskB) try_charge mem_cgroup_out_of_memory try_charge select_bad_process(taskB) oom_kill_process oom_reap_task # No real memory reaped mem_cgroup_out_of_memory # set taskB -> MMF_OOM_SKIP # retry charge mem_cgroup_out_of_memory oom_lockoom_lock select_bad_process(self) oom_kill_process(self) oom_unlock # no eligible task In fact syzbot test triggered this situation by placing multiple tasks into a memcg with hard limit set to 0. So no task really had any memory charged to the memcg : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : Tasks state (memory values in pages): : [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 6569] 0 6562 94271532480 0 syz-executor0 : [ 6576] 0 6576 94260614400 0 syz-executor6 : [ 6578] 0 6578 9426 534614400 0 syz-executor4 : [ 6579] 0 6579 94260573440 0 syz-executor5 : [ 6582] 0 6582 94260614400 0 syz-executor7 : [ 6584] 0 6584 94260573440 0 syz-executor1 so in principle there is indeed nothing reclaimable in this memcg and this looks like a misconfiguration. On the other hand we can clearly kill all those tasks so it is a bit early to warn and scare users. Do that by checking that the current is the oom victim and bypass the warning then. The victim is allowed to force charge and terminate to release its temporal charge along the way. Fixes: "memcg, oom: move out_of_memory back to the charge path" Noticed-by: Greg Thelen Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com Signed-off-by: Michal Hocko --- mm/memcontrol.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..1b6eed1bc404 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int return OOM_ASYNC; } - if (mem_cgroup_out_of_memory(memcg, mask, order)) + if (mem_cgroup_out_of_memory(memcg, mask, order) || + tsk_is_oom_victim(current)) return OOM_SUCCESS; WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " -- 2.18.0 -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch and the reproducer did not trigger crash: Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com Tested on: commit: 116b181bb646 Add linux-next specific files for 20180803 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git kernel config: https://syzkaller.appspot.com/x/.config?x=b4f38be7c2c519d5 compiler: gcc (GCC) 8.0.1 20180413 (experimental) patch: https://syzkaller.appspot.com/x/patch.diff?x=147b3a7240 Note: testing is done by a robot and is best-effort only.
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch and the reproducer did not trigger crash: Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com Tested on: commit: 116b181bb646 Add linux-next specific files for 20180803 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git kernel config: https://syzkaller.appspot.com/x/.config?x=b4f38be7c2c519d5 compiler: gcc (GCC) 8.0.1 20180413 (experimental) patch: https://syzkaller.appspot.com/x/patch.diff?x=147b3a7240 Note: testing is done by a robot and is best-effort only.
Re: WARNING in try_charge
The debugging patch was wrong but I guess I see it finally. It's a race : [ 72.901666] Memory cgroup out of memory: Kill process 6584 (syz-executor1) score 55 or sacrifice child : [ 72.917037] Killed process 6584 (syz-executor1) total-vm:37704kB, anon-rss:2140kB, file-rss:0kB, shmem-rss:0kB : [ 72.927256] task=syz-executor5 pid=6581 charge bypass : [ 72.928046] oom_reaper: reaped process 6584 (syz-executor1), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB : [ 72.932818] task=syz-executor6 pid=6576 invoked memcg oom killer. oom_victim=1 : [ 72.942790] task=syz-executor5 pid=6581 charge for nr_pages=1 : [ 72.949769] syz-executor6 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 : [ 72.955606] task=syz-executor5 pid=6581 charge bypass : [ 72.967394] syz-executor6 cpuset=/ mems_allowed=0 : [ 72.973175] task=syz-executor5 pid=6581 charge for nr_pages=1 : [...] : [ 73.534865] Task in /ile0 killed as a result of limit of /ile0 : [ 73.540865] memory: usage 76kB, limit 0kB, failcnt 260 : [ 73.546142] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 73.552898] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 73.559051] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : [ 73.578533] Tasks state (memory values in pages): : [ 73.583404] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 73.592277] [ 6569] 0 6562 94271532480 0 syz-executor0 : [ 73.601299] [ 6576] 0 6576 94260614400 0 syz-executor6 : [ 73.610333] [ 6578] 0 6578 9426 534614400 0 syz-executor4 : [ 73.619381] [ 6579] 0 6579 94260573440 0 syz-executor5 : [ 73.628414] [ 6582] 0 6582 94260614400 0 syz-executor7 : [ 73.637441] [ 6584] 0 6584 94260573440 0 syz-executor1 : [ 73.646464] Memory cgroup out of memory: Kill process 6578 (syz-executor4) score 549000 or sacrifice child : [ 73.656295] task=syz-executor6 pid=6576 is oom victim now This should be 6578 but we at least know that we are running in 6576 context so the we are setting the state from a remote context which itself has been killed already : [ 73.661841] Killed process 6578 (syz-executor4) total-vm:37704kB, anon-rss:2136kB, file-rss:0kB, shmem-rss:0kB : [ 73.672035] task=syz-executor6 pid=6576 charge bypass : [ 73.672801] oom_reaper: reaped process 6578 (syz-executor4), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB : [ 73.678829] task=syz-executor4 pid=6578 invoked memcg oom killer. oom_victim=1 and here the victim finally reached the oom path finally. : [ 73.687453] task=syz-executor6 pid=6576 charge for nr_pages=1 : [ 73.694534] [ cut here ] : [ 73.700424] task=syz-executor6 pid=6576 charge bypass : [ 73.705175] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. : [ 73.705321] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710 But there is nobody killable. So the oom kill happened _after_ our force charge path. Therefore we should do the following regardless whether we make tis warn or pr_$foo #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 116b181bb646afedd770985de20a68721bdb2648 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..1b6eed1bc404 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int return OOM_ASYNC; } - if (mem_cgroup_out_of_memory(memcg, mask, order)) + if (mem_cgroup_out_of_memory(memcg, mask, order) || + tsk_is_oom_victim(current)) return OOM_SUCCESS; WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
The debugging patch was wrong but I guess I see it finally. It's a race : [ 72.901666] Memory cgroup out of memory: Kill process 6584 (syz-executor1) score 55 or sacrifice child : [ 72.917037] Killed process 6584 (syz-executor1) total-vm:37704kB, anon-rss:2140kB, file-rss:0kB, shmem-rss:0kB : [ 72.927256] task=syz-executor5 pid=6581 charge bypass : [ 72.928046] oom_reaper: reaped process 6584 (syz-executor1), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB : [ 72.932818] task=syz-executor6 pid=6576 invoked memcg oom killer. oom_victim=1 : [ 72.942790] task=syz-executor5 pid=6581 charge for nr_pages=1 : [ 72.949769] syz-executor6 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 : [ 72.955606] task=syz-executor5 pid=6581 charge bypass : [ 72.967394] syz-executor6 cpuset=/ mems_allowed=0 : [ 72.973175] task=syz-executor5 pid=6581 charge for nr_pages=1 : [...] : [ 73.534865] Task in /ile0 killed as a result of limit of /ile0 : [ 73.540865] memory: usage 76kB, limit 0kB, failcnt 260 : [ 73.546142] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 73.552898] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 73.559051] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : [ 73.578533] Tasks state (memory values in pages): : [ 73.583404] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 73.592277] [ 6569] 0 6562 94271532480 0 syz-executor0 : [ 73.601299] [ 6576] 0 6576 94260614400 0 syz-executor6 : [ 73.610333] [ 6578] 0 6578 9426 534614400 0 syz-executor4 : [ 73.619381] [ 6579] 0 6579 94260573440 0 syz-executor5 : [ 73.628414] [ 6582] 0 6582 94260614400 0 syz-executor7 : [ 73.637441] [ 6584] 0 6584 94260573440 0 syz-executor1 : [ 73.646464] Memory cgroup out of memory: Kill process 6578 (syz-executor4) score 549000 or sacrifice child : [ 73.656295] task=syz-executor6 pid=6576 is oom victim now This should be 6578 but we at least know that we are running in 6576 context so the we are setting the state from a remote context which itself has been killed already : [ 73.661841] Killed process 6578 (syz-executor4) total-vm:37704kB, anon-rss:2136kB, file-rss:0kB, shmem-rss:0kB : [ 73.672035] task=syz-executor6 pid=6576 charge bypass : [ 73.672801] oom_reaper: reaped process 6578 (syz-executor4), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB : [ 73.678829] task=syz-executor4 pid=6578 invoked memcg oom killer. oom_victim=1 and here the victim finally reached the oom path finally. : [ 73.687453] task=syz-executor6 pid=6576 charge for nr_pages=1 : [ 73.694534] [ cut here ] : [ 73.700424] task=syz-executor6 pid=6576 charge bypass : [ 73.705175] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. : [ 73.705321] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710 But there is nobody killable. So the oom kill happened _after_ our force charge path. Therefore we should do the following regardless whether we make tis warn or pr_$foo #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 116b181bb646afedd770985de20a68721bdb2648 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..1b6eed1bc404 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int return OOM_ASYNC; } - if (mem_cgroup_out_of_memory(memcg, mask, order)) + if (mem_cgroup_out_of_memory(memcg, mask, order) || + tsk_is_oom_victim(current)) return OOM_SUCCESS; WARN(1,"Memory cgroup charge failed because of no reclaimable memory! " -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 20:13:39, Michal Hocko wrote: > I simply do not see how this is possible. Let's try with the following > extended debugging patch. > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > 116b181bb646afedd770985de20a68721bdb2648 > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..e2dfdf4361ba 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup > *memcg, gfp_t gfp_mask, > bool ret; > > mutex_lock(_lock); > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", > + current->comm, current->pid, > tsk_is_oom_victim(current)); > ret = out_of_memory(); > mutex_unlock(_lock); > return ret; > @@ -2108,6 +2110,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t > gfp_mask, > > if (mem_cgroup_is_root(memcg)) > return 0; > + if (tsk_is_oom_victim(current)) > + pr_info("task=%s pid=%d charge for nr_pages=%d\n", > + current->comm, current->pid, nr_pages); > retry: > if (consume_stock(memcg, nr_pages)) > return 0; > @@ -2137,8 +2142,11 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t > gfp_mask, >*/ > if (unlikely(tsk_is_oom_victim(current) || >fatal_signal_pending(current) || > - current->flags & PF_EXITING)) > + current->flags & PF_EXITING)) { > + pr_info("task=%s pid=%d charge bypass\n", > + current->comm, current->pid); > goto force; > + } > > /* >* Prevent unbounded recursion when reclaim operations need to > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 104ef4a01a55..7d9adcde8cf6 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -692,6 +692,8 @@ static void mark_oom_victim(struct task_struct *tsk) > __thaw_task(tsk); > atomic_inc(_victims); > trace_mark_victim(tsk->pid); > + pr_info("task=%s pid=%d is oom victim now\n", > + current->comm, current->pid); > } Ble this should be s@current@tsk@g. But it didn't really help. I will have to think about this more. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 20:13:39, Michal Hocko wrote: > I simply do not see how this is possible. Let's try with the following > extended debugging patch. > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > 116b181bb646afedd770985de20a68721bdb2648 > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..e2dfdf4361ba 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup > *memcg, gfp_t gfp_mask, > bool ret; > > mutex_lock(_lock); > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", > + current->comm, current->pid, > tsk_is_oom_victim(current)); > ret = out_of_memory(); > mutex_unlock(_lock); > return ret; > @@ -2108,6 +2110,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t > gfp_mask, > > if (mem_cgroup_is_root(memcg)) > return 0; > + if (tsk_is_oom_victim(current)) > + pr_info("task=%s pid=%d charge for nr_pages=%d\n", > + current->comm, current->pid, nr_pages); > retry: > if (consume_stock(memcg, nr_pages)) > return 0; > @@ -2137,8 +2142,11 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t > gfp_mask, >*/ > if (unlikely(tsk_is_oom_victim(current) || >fatal_signal_pending(current) || > - current->flags & PF_EXITING)) > + current->flags & PF_EXITING)) { > + pr_info("task=%s pid=%d charge bypass\n", > + current->comm, current->pid); > goto force; > + } > > /* >* Prevent unbounded recursion when reclaim operations need to > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 104ef4a01a55..7d9adcde8cf6 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -692,6 +692,8 @@ static void mark_oom_victim(struct task_struct *tsk) > __thaw_task(tsk); > atomic_inc(_victims); > trace_mark_victim(tsk->pid); > + pr_info("task=%s pid=%d is oom victim now\n", > + current->comm, current->pid); > } Ble this should be s@current@tsk@g. But it didn't really help. I will have to think about this more. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch but the reproducer still triggered crash: WARNING in try_charge task=syz-executor4 pid=6578 invoked memcg oom killer. oom_victim=1 task=syz-executor6 pid=6576 charge for nr_pages=1 [ cut here ] task=syz-executor6 pid=6576 charge bypass Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 mem_cgroup_oom mm/memcontrol.c:1706 [inline] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710 mm/memcontrol.c:2270 task=syz-executor6 pid=6576 charge for nr_pages=1 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 6578 Comm: syz-executor4 Not tainted 4.18.0-rc7-next-20180803+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 task=syz-executor6 pid=6576 charge bypass panic+0x238/0x4e7 kernel/panic.c:184 task=syz-executor6 pid=6576 charge for nr_pages=1 __warn.cold.8+0x163/0x1ba kernel/panic.c:536 task=syz-executor6 pid=6576 charge bypass report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296 task=syz-executor6 pid=6576 charge for nr_pages=1 task=syz-executor6 pid=6576 charge bypass do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:996 RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline] RIP: 0010:try_charge+0xafa/0x1710 mm/memcontrol.c:2270 task=syz-executor6 pid=6576 charge for nr_pages=1 Code: 85 4a 01 00 00 8b b5 bc fd ff ff 44 89 f2 4c 89 ff e8 3a 4d ff ff 84 c0 0f 85 9b 04 00 00 48 c7 c7 a0 18 13 87 e8 86 f9 85 ff <0f> 0b 48 8b 95 d0 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00 RSP: 0018:8801af5bf458 EFLAGS: 00010282 RAX: RBX: dc00 RCX: RDX: RSI: 816366f1 RDI: 8801af5bf148 task=syz-executor6 pid=6576 charge bypass RBP: 8801af5bf6f0 R08: 8801aeae2540 R09: fbfff0ff11fc R10: fbfff0ff11fc R11: 87f88fe3 R12: 8801b6bd4800 R13: 8801af5bf6c8 R14: R15: 8801b6bd4800 task=syz-executor5 pid=6579 invoked memcg oom killer. oom_victim=1 [ cut here ] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. memcg_kmem_charge_memcg+0x7c/0x120 mm/memcontrol.c:2600 WARNING: CPU: 0 PID: 6579 at mm/memcontrol.c:1707 mem_cgroup_oom mm/memcontrol.c:1706 [inline] WARNING: CPU: 0 PID: 6579 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710 mm/memcontrol.c:2270 memcg_charge_slab mm/slab.h:283 [inline] kmem_getpages mm/slab.c:1415 [inline] cache_grow_begin+0x207/0x710 mm/slab.c:2677 Modules linked in: fallback_alloc+0x203/0x2c0 mm/slab.c:3219 CPU: 0 PID: 6579 Comm: syz-executor5 Not tainted 4.18.0-rc7-next-20180803+ #1 cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 __do_cache_alloc mm/slab.c:3356 [inline] slab_alloc mm/slab.c:3384 [inline] kmem_cache_alloc+0x1e5/0x760 mm/slab.c:3552 RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline] RIP: 0010:try_charge+0xafa/0x1710 mm/memcontrol.c:2270 anon_vma_chain_alloc mm/rmap.c:129 [inline] __anon_vma_prepare+0xc4/0x720 mm/rmap.c:183 Code: 85 4a 01 00 00 8b b5 bc fd ff ff 44 89 f2 4c 89 ff e8 3a 4d ff ff 84 c0 0f 85 9b 04 00 00 48 c7 c7 a0 18 13 87 e8 86 f9 85 ff <0f> 0b 48 8b 95 d0 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00 RSP: 0018:8801c82f7458 EFLAGS: 00010282 RAX: RBX: dc00 RCX: RDX: RSI: 816366f1 RDI: 8801c82f7148 RBP: 8801c82f76f0 R08: 8801ae6905c0 R09: fbfff0ff11fc R10: fbfff0ff11fc R11: 87f88fe3 R12: 8801b6bd4800 anon_vma_prepare include/linux/rmap.h:153 [inline] do_anonymous_page mm/memory.c:3160 [inline] handle_pte_fault mm/memory.c:3971 [inline] __handle_mm_fault+0x3556/0x4470 mm/memory.c:4097 R13: 8801c82f76c8 R14: R15: 8801b6bd4800 FS: 00b19940() GS:8801db00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 01dc8978 CR3: 0001c83bc000 CR4: 001406f0 Call Trace: memcg_kmem_charge_memcg+0x7c/0x120 mm/memcontrol.c:2600 memcg_charge_slab mm/slab.h:283 [inline] kmem_getpages mm/slab.c:1415 [inline] cache_grow_begin+0x207/0x710 mm/slab.c:2677 handle_mm_fault+0x53e/0xc80 mm/memory.c:4134 fallback_alloc+0x203/0x2c0 mm/slab.c:3219 cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287 __do_page_fault+0x620/0xe50 arch/x86/mm/fault.c:1395 __do_cache_alloc mm/slab.c:3356 [inline] slab_alloc mm/slab.c:3384 [inline] kmem_cache_alloc
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch but the reproducer still triggered crash: WARNING in try_charge task=syz-executor4 pid=6578 invoked memcg oom killer. oom_victim=1 task=syz-executor6 pid=6576 charge for nr_pages=1 [ cut here ] task=syz-executor6 pid=6576 charge bypass Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 mem_cgroup_oom mm/memcontrol.c:1706 [inline] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710 mm/memcontrol.c:2270 task=syz-executor6 pid=6576 charge for nr_pages=1 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 6578 Comm: syz-executor4 Not tainted 4.18.0-rc7-next-20180803+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 task=syz-executor6 pid=6576 charge bypass panic+0x238/0x4e7 kernel/panic.c:184 task=syz-executor6 pid=6576 charge for nr_pages=1 __warn.cold.8+0x163/0x1ba kernel/panic.c:536 task=syz-executor6 pid=6576 charge bypass report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296 task=syz-executor6 pid=6576 charge for nr_pages=1 task=syz-executor6 pid=6576 charge bypass do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:996 RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline] RIP: 0010:try_charge+0xafa/0x1710 mm/memcontrol.c:2270 task=syz-executor6 pid=6576 charge for nr_pages=1 Code: 85 4a 01 00 00 8b b5 bc fd ff ff 44 89 f2 4c 89 ff e8 3a 4d ff ff 84 c0 0f 85 9b 04 00 00 48 c7 c7 a0 18 13 87 e8 86 f9 85 ff <0f> 0b 48 8b 95 d0 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00 RSP: 0018:8801af5bf458 EFLAGS: 00010282 RAX: RBX: dc00 RCX: RDX: RSI: 816366f1 RDI: 8801af5bf148 task=syz-executor6 pid=6576 charge bypass RBP: 8801af5bf6f0 R08: 8801aeae2540 R09: fbfff0ff11fc R10: fbfff0ff11fc R11: 87f88fe3 R12: 8801b6bd4800 R13: 8801af5bf6c8 R14: R15: 8801b6bd4800 task=syz-executor5 pid=6579 invoked memcg oom killer. oom_victim=1 [ cut here ] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. memcg_kmem_charge_memcg+0x7c/0x120 mm/memcontrol.c:2600 WARNING: CPU: 0 PID: 6579 at mm/memcontrol.c:1707 mem_cgroup_oom mm/memcontrol.c:1706 [inline] WARNING: CPU: 0 PID: 6579 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710 mm/memcontrol.c:2270 memcg_charge_slab mm/slab.h:283 [inline] kmem_getpages mm/slab.c:1415 [inline] cache_grow_begin+0x207/0x710 mm/slab.c:2677 Modules linked in: fallback_alloc+0x203/0x2c0 mm/slab.c:3219 CPU: 0 PID: 6579 Comm: syz-executor5 Not tainted 4.18.0-rc7-next-20180803+ #1 cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 __do_cache_alloc mm/slab.c:3356 [inline] slab_alloc mm/slab.c:3384 [inline] kmem_cache_alloc+0x1e5/0x760 mm/slab.c:3552 RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline] RIP: 0010:try_charge+0xafa/0x1710 mm/memcontrol.c:2270 anon_vma_chain_alloc mm/rmap.c:129 [inline] __anon_vma_prepare+0xc4/0x720 mm/rmap.c:183 Code: 85 4a 01 00 00 8b b5 bc fd ff ff 44 89 f2 4c 89 ff e8 3a 4d ff ff 84 c0 0f 85 9b 04 00 00 48 c7 c7 a0 18 13 87 e8 86 f9 85 ff <0f> 0b 48 8b 95 d0 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00 RSP: 0018:8801c82f7458 EFLAGS: 00010282 RAX: RBX: dc00 RCX: RDX: RSI: 816366f1 RDI: 8801c82f7148 RBP: 8801c82f76f0 R08: 8801ae6905c0 R09: fbfff0ff11fc R10: fbfff0ff11fc R11: 87f88fe3 R12: 8801b6bd4800 anon_vma_prepare include/linux/rmap.h:153 [inline] do_anonymous_page mm/memory.c:3160 [inline] handle_pte_fault mm/memory.c:3971 [inline] __handle_mm_fault+0x3556/0x4470 mm/memory.c:4097 R13: 8801c82f76c8 R14: R15: 8801b6bd4800 FS: 00b19940() GS:8801db00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 01dc8978 CR3: 0001c83bc000 CR4: 001406f0 Call Trace: memcg_kmem_charge_memcg+0x7c/0x120 mm/memcontrol.c:2600 memcg_charge_slab mm/slab.h:283 [inline] kmem_getpages mm/slab.c:1415 [inline] cache_grow_begin+0x207/0x710 mm/slab.c:2677 handle_mm_fault+0x53e/0xc80 mm/memory.c:4134 fallback_alloc+0x203/0x2c0 mm/slab.c:3219 cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287 __do_page_fault+0x620/0xe50 arch/x86/mm/fault.c:1395 __do_cache_alloc mm/slab.c:3356 [inline] slab_alloc mm/slab.c:3384 [inline] kmem_cache_alloc
Re: WARNING in try_charge
I simply do not see how this is possible. Let's try with the following extended debugging patch. #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 116b181bb646afedd770985de20a68721bdb2648 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..e2dfdf4361ba 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(); mutex_unlock(_lock); return ret; @@ -2108,6 +2110,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, if (mem_cgroup_is_root(memcg)) return 0; + if (tsk_is_oom_victim(current)) + pr_info("task=%s pid=%d charge for nr_pages=%d\n", + current->comm, current->pid, nr_pages); retry: if (consume_stock(memcg, nr_pages)) return 0; @@ -2137,8 +2142,11 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, */ if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || -current->flags & PF_EXITING)) +current->flags & PF_EXITING)) { + pr_info("task=%s pid=%d charge bypass\n", + current->comm, current->pid); goto force; + } /* * Prevent unbounded recursion when reclaim operations need to diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 104ef4a01a55..7d9adcde8cf6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -692,6 +692,8 @@ static void mark_oom_victim(struct task_struct *tsk) __thaw_task(tsk); atomic_inc(_victims); trace_mark_victim(tsk->pid); + pr_info("task=%s pid=%d is oom victim now\n", + current->comm, current->pid); } /** -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
I simply do not see how this is possible. Let's try with the following extended debugging patch. #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 116b181bb646afedd770985de20a68721bdb2648 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..e2dfdf4361ba 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(); mutex_unlock(_lock); return ret; @@ -2108,6 +2110,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, if (mem_cgroup_is_root(memcg)) return 0; + if (tsk_is_oom_victim(current)) + pr_info("task=%s pid=%d charge for nr_pages=%d\n", + current->comm, current->pid, nr_pages); retry: if (consume_stock(memcg, nr_pages)) return 0; @@ -2137,8 +2142,11 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, */ if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || -current->flags & PF_EXITING)) +current->flags & PF_EXITING)) { + pr_info("task=%s pid=%d charge bypass\n", + current->comm, current->pid); goto force; + } /* * Prevent unbounded recursion when reclaim operations need to diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 104ef4a01a55..7d9adcde8cf6 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -692,6 +692,8 @@ static void mark_oom_victim(struct task_struct *tsk) __thaw_task(tsk); atomic_inc(_victims); trace_mark_victim(tsk->pid); + pr_info("task=%s pid=%d is oom victim now\n", + current->comm, current->pid); } /** -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 19:44:10, Michal Hocko wrote: > On Mon 06-08-18 08:42:02, syzbot wrote: > > Hello, > > > > syzbot has tested the proposed patch but the reproducer still triggered > > crash: > > WARNING in try_charge > > > > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, > > file-rss:0kB, shmem-rss:0kB > > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, > > file-rss:0kB, shmem-rss:0kB > > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 > > Thank you. This is useful. The full oom picture is this > : [ 65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. > oom_victim=0 > [...] > : [ 65.920355] Task in /ile0 killed as a result of limit of /ile0 > : [ 65.926389] memory: usage 0kB, limit 0kB, failcnt 20 > : [ 65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > : [ 65.963878] Tasks state (memory values in pages): > : [ 65.968743] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 65.977615] [ 6410] 0 6410 9427 532614400 > 0 syz-executor5 > : [ 65.986647] Memory cgroup out of memory: Kill process 6410 > (syz-executor5) score 547000 or sacrifice child > : [ 65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, > anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB > : [ 66.007471] oom_reaper: reaped process 6410 (syz-executor5), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. > oom_victim=1 > : [ 66.025137] [ cut here ] > : [ 66.029927] Memory cgroup charge failed because of no reclaimable > memory! This looks like a misconfiguration or a kernel bug. > : [ 66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 > try_charge+0x734/0x1680 > > So we have only a single task in the memcg and it is this task which > triggers the OOM. It gets killed and oom_reaped. This means that > out_of_memory should return with true and so we should retry and force > the charge as I've already mentioned. For some reason this task has > triggered the oom killer path again and then we haven't found any > eligible task and resulted in the warning. This shouldn't happen. > > I will stare to the code some more to see how the heck we get there > without passing > if (unlikely(tsk_is_oom_victim(current) || >fatal_signal_pending(current) || >current->flags & PF_EXITING)) > goto force; Hmm, so while the OOM killer was invoked from [ 65.405905] Call Trace: [ 65.408498] dump_stack+0x1c9/0x2b4 [ 65.421606] dump_header+0x27b/0xf70 [ 65.545094] oom_kill_process.cold.28+0x10/0x95a [ 65.605696] out_of_memory+0xa8a/0x14d0 [ 65.627227] mem_cgroup_out_of_memory+0x213/0x300 [ 65.641293] try_charge+0x720/0x1680 [ 65.674806] memcg_kmem_charge_memcg+0x7c/0x120 [ 65.687939] cache_grow_begin+0x207/0x710 [ 65.696553] fallback_alloc+0x203/0x2c0 [ 65.700519] cache_alloc_node+0x1c7/0x1e0 [ 65.704999] kmem_cache_alloc+0x1e5/0x760 [ 65.717947] shmem_alloc_inode+0x1b/0x40 [ 65.722003] alloc_inode+0x63/0x190 [ 65.725642] new_inode_pseudo+0x71/0x1a0 [ 65.738077] new_inode+0x1c/0x40 [ 65.741432] shmem_get_inode+0xf1/0x910 [ 65.771550] __shmem_file_setup.part.48+0x83/0x2a0 [ 65.776482] shmem_file_setup+0x65/0x90 [ 65.780444] __x64_sys_memfd_create+0x2af/0x4f0 The warning happened from a different path [ 66.151455] RIP: 0010:try_charge+0x734/0x1680 [...] [ 66.270886] mem_cgroup_try_charge+0x4ff/0xa70 [ 66.305602] mem_cgroup_try_charge_delay+0x1d/0x90 [ 66.310514] __handle_mm_fault+0x25be/0x4470 [ 66.366608] handle_mm_fault+0x53e/0xc80 [ 66.402384] do_page_fault+0xf6/0x8c0 [ 66.451629] page_fault+0x1e/0x30 So the oom victim indeed passed the above force path after the oom invocation. But later on hit the page fault path and that behaved differently and for some reason the force path hasn't triggered. I am wondering how could we hit the page fault path in the first place. The task is already killed! So what the hell is going on here. I must be missing something obvious here. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 19:44:10, Michal Hocko wrote: > On Mon 06-08-18 08:42:02, syzbot wrote: > > Hello, > > > > syzbot has tested the proposed patch but the reproducer still triggered > > crash: > > WARNING in try_charge > > > > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, > > file-rss:0kB, shmem-rss:0kB > > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, > > file-rss:0kB, shmem-rss:0kB > > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 > > Thank you. This is useful. The full oom picture is this > : [ 65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. > oom_victim=0 > [...] > : [ 65.920355] Task in /ile0 killed as a result of limit of /ile0 > : [ 65.926389] memory: usage 0kB, limit 0kB, failcnt 20 > : [ 65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > : [ 65.963878] Tasks state (memory values in pages): > : [ 65.968743] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 65.977615] [ 6410] 0 6410 9427 532614400 > 0 syz-executor5 > : [ 65.986647] Memory cgroup out of memory: Kill process 6410 > (syz-executor5) score 547000 or sacrifice child > : [ 65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, > anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB > : [ 66.007471] oom_reaper: reaped process 6410 (syz-executor5), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. > oom_victim=1 > : [ 66.025137] [ cut here ] > : [ 66.029927] Memory cgroup charge failed because of no reclaimable > memory! This looks like a misconfiguration or a kernel bug. > : [ 66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 > try_charge+0x734/0x1680 > > So we have only a single task in the memcg and it is this task which > triggers the OOM. It gets killed and oom_reaped. This means that > out_of_memory should return with true and so we should retry and force > the charge as I've already mentioned. For some reason this task has > triggered the oom killer path again and then we haven't found any > eligible task and resulted in the warning. This shouldn't happen. > > I will stare to the code some more to see how the heck we get there > without passing > if (unlikely(tsk_is_oom_victim(current) || >fatal_signal_pending(current) || >current->flags & PF_EXITING)) > goto force; Hmm, so while the OOM killer was invoked from [ 65.405905] Call Trace: [ 65.408498] dump_stack+0x1c9/0x2b4 [ 65.421606] dump_header+0x27b/0xf70 [ 65.545094] oom_kill_process.cold.28+0x10/0x95a [ 65.605696] out_of_memory+0xa8a/0x14d0 [ 65.627227] mem_cgroup_out_of_memory+0x213/0x300 [ 65.641293] try_charge+0x720/0x1680 [ 65.674806] memcg_kmem_charge_memcg+0x7c/0x120 [ 65.687939] cache_grow_begin+0x207/0x710 [ 65.696553] fallback_alloc+0x203/0x2c0 [ 65.700519] cache_alloc_node+0x1c7/0x1e0 [ 65.704999] kmem_cache_alloc+0x1e5/0x760 [ 65.717947] shmem_alloc_inode+0x1b/0x40 [ 65.722003] alloc_inode+0x63/0x190 [ 65.725642] new_inode_pseudo+0x71/0x1a0 [ 65.738077] new_inode+0x1c/0x40 [ 65.741432] shmem_get_inode+0xf1/0x910 [ 65.771550] __shmem_file_setup.part.48+0x83/0x2a0 [ 65.776482] shmem_file_setup+0x65/0x90 [ 65.780444] __x64_sys_memfd_create+0x2af/0x4f0 The warning happened from a different path [ 66.151455] RIP: 0010:try_charge+0x734/0x1680 [...] [ 66.270886] mem_cgroup_try_charge+0x4ff/0xa70 [ 66.305602] mem_cgroup_try_charge_delay+0x1d/0x90 [ 66.310514] __handle_mm_fault+0x25be/0x4470 [ 66.366608] handle_mm_fault+0x53e/0xc80 [ 66.402384] do_page_fault+0xf6/0x8c0 [ 66.451629] page_fault+0x1e/0x30 So the oom victim indeed passed the above force path after the oom invocation. But later on hit the page fault path and that behaved differently and for some reason the force path hasn't triggered. I am wondering how could we hit the page fault path in the first place. The task is already killed! So what the hell is going on here. I must be missing something obvious here. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 7:30 PM, Michal Hocko wrote: >> >> >> A much >> >> >> friendlier for user way to say this would be print a message at the >> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID >> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack >> >> >> trace (does not give any useful info for user). And return EINVAL if >> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the >> >> >> WARNING each occurrence of which we do want to be reported to kernel >> >> >> developers. >> >> > >> >> > But this is not applicable here. Your misconfiguration is quite obvious >> >> > because you simply set the hard limit to 0. This is not the only >> >> > situation when this can happen. There is no clear point to tell, you are >> >> > doing this wrong. If it was we would do it at that point obviously. >> >> >> >> But, isn't there a point were hard limit is set to 0? I would expect >> >> there is a something like cgroup file write handler with a value of 0 >> >> or something. >> > >> > Yeah, but this is only one instance of the problem. Other is that the >> > memcg is not reclaimable for any other reasons. And we do not know what >> > those might be >> > >> >> >> >> > If you have a strong reason to believe that this is an abuse of WARN I >> >> > am all happy to change that. But I haven't heard any yet, to be honest. >> >> >> >> WARN must not be used for anything that is not kernel bugs. If this is >> >> not kernel bug, WARN must not be used here. >> > >> > This is rather strong wording without any backing arguments. I strongly >> > doubt 90% of existing WARN* match this expectation. WARN* has >> > traditionally been a way to tell that something suspicious is going on. >> > Those situation are mostly likely not fatal but it is good to know they >> > are happening. >> > >> > Sure there is that panic_on_warn thingy which you seem to be using and I >> > suspect it is a reason why you are so careful about warnings in general >> > but my experience tells me that this configuration is barely usable >> > except for testing (which is your case). >> > >> > But as I've said, I do not insist on WARN here. All I care about is to >> > warn user that something might go south and this may be either due to >> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. >> >> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is >> also a kernel bug on top of that and it's actually a kernel bug that >> provokes the warning? > > As I've tried to tell already. I cannot tell for sure. It is the killed > oom victim which triggered thw warning and that shouldn't really > happen. Considering this doesn't reproduce with the current linux next > nor linus tree and the oom code has changed since the version you have > tested then I would suspect there was something wrong with the memcg oom > code. But maybe the test doesn't really reproduce reliably. > >> If it's a kernel bug, then I propose to stop arguing about >> configuration and concentrate on the bug. >> If it's just the misconfiguration that triggers the warning, then can >> we separate the 2 causes of the warning (user misconfiguration and >> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print >> a line to console if necessary)? Or if the limit=0 is somehow not >> possible/desirable to detect right away, check limit=0 at the point of >> the warning and don't want? > > No we simply cannot. There is numerous situations when this can trigger. > Say you set the hard limit to N and then try to fault in shmem file with > the size >= N. No oom killer will help to reclaim memory. Or say you > migrate the all tasks away from the memcg and then somebody triggers the > memcg OOM in that group. There is simply nobody to kill. See the point? > There is simply no direct contection between the configuration and > actual problem. Too many things might happen between those two points. > Let me repeat. We do warn because we want to hear if this happens. WARN > tends to be a good way to get that attention. If you strongly believe > this is an abuse I won't mind seeing a patch to turn it into something > different. I don't believe it is an abuse, I don't know this code well. Let's assume the misconfiguration is a red-herring for now then.
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 7:30 PM, Michal Hocko wrote: >> >> >> A much >> >> >> friendlier for user way to say this would be print a message at the >> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID >> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack >> >> >> trace (does not give any useful info for user). And return EINVAL if >> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the >> >> >> WARNING each occurrence of which we do want to be reported to kernel >> >> >> developers. >> >> > >> >> > But this is not applicable here. Your misconfiguration is quite obvious >> >> > because you simply set the hard limit to 0. This is not the only >> >> > situation when this can happen. There is no clear point to tell, you are >> >> > doing this wrong. If it was we would do it at that point obviously. >> >> >> >> But, isn't there a point were hard limit is set to 0? I would expect >> >> there is a something like cgroup file write handler with a value of 0 >> >> or something. >> > >> > Yeah, but this is only one instance of the problem. Other is that the >> > memcg is not reclaimable for any other reasons. And we do not know what >> > those might be >> > >> >> >> >> > If you have a strong reason to believe that this is an abuse of WARN I >> >> > am all happy to change that. But I haven't heard any yet, to be honest. >> >> >> >> WARN must not be used for anything that is not kernel bugs. If this is >> >> not kernel bug, WARN must not be used here. >> > >> > This is rather strong wording without any backing arguments. I strongly >> > doubt 90% of existing WARN* match this expectation. WARN* has >> > traditionally been a way to tell that something suspicious is going on. >> > Those situation are mostly likely not fatal but it is good to know they >> > are happening. >> > >> > Sure there is that panic_on_warn thingy which you seem to be using and I >> > suspect it is a reason why you are so careful about warnings in general >> > but my experience tells me that this configuration is barely usable >> > except for testing (which is your case). >> > >> > But as I've said, I do not insist on WARN here. All I care about is to >> > warn user that something might go south and this may be either due to >> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. >> >> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is >> also a kernel bug on top of that and it's actually a kernel bug that >> provokes the warning? > > As I've tried to tell already. I cannot tell for sure. It is the killed > oom victim which triggered thw warning and that shouldn't really > happen. Considering this doesn't reproduce with the current linux next > nor linus tree and the oom code has changed since the version you have > tested then I would suspect there was something wrong with the memcg oom > code. But maybe the test doesn't really reproduce reliably. > >> If it's a kernel bug, then I propose to stop arguing about >> configuration and concentrate on the bug. >> If it's just the misconfiguration that triggers the warning, then can >> we separate the 2 causes of the warning (user misconfiguration and >> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print >> a line to console if necessary)? Or if the limit=0 is somehow not >> possible/desirable to detect right away, check limit=0 at the point of >> the warning and don't want? > > No we simply cannot. There is numerous situations when this can trigger. > Say you set the hard limit to N and then try to fault in shmem file with > the size >= N. No oom killer will help to reclaim memory. Or say you > migrate the all tasks away from the memcg and then somebody triggers the > memcg OOM in that group. There is simply nobody to kill. See the point? > There is simply no direct contection between the configuration and > actual problem. Too many things might happen between those two points. > Let me repeat. We do warn because we want to hear if this happens. WARN > tends to be a good way to get that attention. If you strongly believe > this is an abuse I won't mind seeing a patch to turn it into something > different. I don't believe it is an abuse, I don't know this code well. Let's assume the misconfiguration is a red-herring for now then.
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 7:44 PM, Michal Hocko wrote: > On Mon 06-08-18 08:42:02, syzbot wrote: >> Hello, >> >> syzbot has tested the proposed patch but the reproducer still triggered >> crash: >> WARNING in try_charge >> >> Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, >> file-rss:0kB, shmem-rss:0kB >> oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, >> file-rss:0kB, shmem-rss:0kB >> task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 > > Thank you. This is useful. The full oom picture is this > : [ 65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. > oom_victim=0 > [...] > : [ 65.920355] Task in /ile0 killed as a result of limit of /ile0 > : [ 65.926389] memory: usage 0kB, limit 0kB, failcnt 20 > : [ 65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > : [ 65.963878] Tasks state (memory values in pages): > : [ 65.968743] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 65.977615] [ 6410] 0 6410 9427 532614400 > 0 syz-executor5 > : [ 65.986647] Memory cgroup out of memory: Kill process 6410 > (syz-executor5) score 547000 or sacrifice child > : [ 65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, > anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB > : [ 66.007471] oom_reaper: reaped process 6410 (syz-executor5), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. > oom_victim=1 > : [ 66.025137] [ cut here ] > : [ 66.029927] Memory cgroup charge failed because of no reclaimable > memory! This looks like a misconfiguration or a kernel bug. > : [ 66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 > try_charge+0x734/0x1680 > > So we have only a single task in the memcg and it is this task which > triggers the OOM. It gets killed and oom_reaped. This means that > out_of_memory should return with true and so we should retry and force > the charge as I've already mentioned. For some reason this task has > triggered the oom killer path again and then we haven't found any > eligible task and resulted in the warning. This shouldn't happen. > > I will stare to the code some more to see how the heck we get there > without passing > if (unlikely(tsk_is_oom_victim(current) || > fatal_signal_pending(current) || > current->flags & PF_EXITING)) > goto force; This is the syzkaller reproducer if it will give you any hints: mkdir(&(0x7f000340)='./file0\x00', 0x0) mount(&(0x7fc0)='./file0//ile0\x00', &(0x7f80)='./file0\x00', &(0x7f000200)='cgroup2\x00', 0x0, 0x0) r0 = open(&(0x7f40)='./file0//ile0\x00', 0x0, 0x0) r1 = openat$cgroup_procs(r0, &(0x7f0001c0)='cgroup.procs\x00', 0x2, 0x0) write$cgroup_pid(r1, &(0x7f000100), 0x12) syz_mount_image$xfs(&(0x7f000280)='xfs\x00', &(0x7f0002c0)='./file0//ile0\x00', 0x0, 0x0, &(0x7f000740), 0x0, &(0x7f000800)) getsockopt$sock_cred(r0, 0x1, 0x11, &(0x7f000180), &(0x7f000240)=0xc) r2 = open(&(0x7f40)='./file0//ile0\x00', 0x0, 0x0) write$FUSE_STATFS(r2, &(0x7f000280)={0x60, 0xfffe, 0x7, {{0x401, 0x7, 0x7d, 0x1c, 0xd8, 0xb7, 0x2, 0x3}}}, 0x60) r3 = openat$cgroup_int(r2, &(0x7f0001c0)='memory.max\x00', 0x2, 0x0) write$cgroup_int(r3, &(0x7fc0), 0x12)
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 7:44 PM, Michal Hocko wrote: > On Mon 06-08-18 08:42:02, syzbot wrote: >> Hello, >> >> syzbot has tested the proposed patch but the reproducer still triggered >> crash: >> WARNING in try_charge >> >> Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, >> file-rss:0kB, shmem-rss:0kB >> oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, >> file-rss:0kB, shmem-rss:0kB >> task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 > > Thank you. This is useful. The full oom picture is this > : [ 65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. > oom_victim=0 > [...] > : [ 65.920355] Task in /ile0 killed as a result of limit of /ile0 > : [ 65.926389] memory: usage 0kB, limit 0kB, failcnt 20 > : [ 65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > : [ 65.963878] Tasks state (memory values in pages): > : [ 65.968743] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 65.977615] [ 6410] 0 6410 9427 532614400 > 0 syz-executor5 > : [ 65.986647] Memory cgroup out of memory: Kill process 6410 > (syz-executor5) score 547000 or sacrifice child > : [ 65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, > anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB > : [ 66.007471] oom_reaper: reaped process 6410 (syz-executor5), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > : [ 66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. > oom_victim=1 > : [ 66.025137] [ cut here ] > : [ 66.029927] Memory cgroup charge failed because of no reclaimable > memory! This looks like a misconfiguration or a kernel bug. > : [ 66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 > try_charge+0x734/0x1680 > > So we have only a single task in the memcg and it is this task which > triggers the OOM. It gets killed and oom_reaped. This means that > out_of_memory should return with true and so we should retry and force > the charge as I've already mentioned. For some reason this task has > triggered the oom killer path again and then we haven't found any > eligible task and resulted in the warning. This shouldn't happen. > > I will stare to the code some more to see how the heck we get there > without passing > if (unlikely(tsk_is_oom_victim(current) || > fatal_signal_pending(current) || > current->flags & PF_EXITING)) > goto force; This is the syzkaller reproducer if it will give you any hints: mkdir(&(0x7f000340)='./file0\x00', 0x0) mount(&(0x7fc0)='./file0//ile0\x00', &(0x7f80)='./file0\x00', &(0x7f000200)='cgroup2\x00', 0x0, 0x0) r0 = open(&(0x7f40)='./file0//ile0\x00', 0x0, 0x0) r1 = openat$cgroup_procs(r0, &(0x7f0001c0)='cgroup.procs\x00', 0x2, 0x0) write$cgroup_pid(r1, &(0x7f000100), 0x12) syz_mount_image$xfs(&(0x7f000280)='xfs\x00', &(0x7f0002c0)='./file0//ile0\x00', 0x0, 0x0, &(0x7f000740), 0x0, &(0x7f000800)) getsockopt$sock_cred(r0, 0x1, 0x11, &(0x7f000180), &(0x7f000240)=0xc) r2 = open(&(0x7f40)='./file0//ile0\x00', 0x0, 0x0) write$FUSE_STATFS(r2, &(0x7f000280)={0x60, 0xfffe, 0x7, {{0x401, 0x7, 0x7d, 0x1c, 0xd8, 0xb7, 0x2, 0x3}}}, 0x60) r3 = openat$cgroup_int(r2, &(0x7f0001c0)='memory.max\x00', 0x2, 0x0) write$cgroup_int(r3, &(0x7fc0), 0x12)
Re: WARNING in try_charge
On Mon 06-08-18 08:42:02, syzbot wrote: > Hello, > > syzbot has tested the proposed patch but the reproducer still triggered > crash: > WARNING in try_charge > > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, > file-rss:0kB, shmem-rss:0kB > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, > file-rss:0kB, shmem-rss:0kB > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 Thank you. This is useful. The full oom picture is this : [ 65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. oom_victim=0 [...] : [ 65.920355] Task in /ile0 killed as a result of limit of /ile0 : [ 65.926389] memory: usage 0kB, limit 0kB, failcnt 20 : [ 65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : [ 65.963878] Tasks state (memory values in pages): : [ 65.968743] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 65.977615] [ 6410] 0 6410 9427 532614400 0 syz-executor5 : [ 65.986647] Memory cgroup out of memory: Kill process 6410 (syz-executor5) score 547000 or sacrifice child : [ 65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB : [ 66.007471] oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB : [ 66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 : [ 66.025137] [ cut here ] : [ 66.029927] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. : [ 66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 So we have only a single task in the memcg and it is this task which triggers the OOM. It gets killed and oom_reaped. This means that out_of_memory should return with true and so we should retry and force the charge as I've already mentioned. For some reason this task has triggered the oom killer path again and then we haven't found any eligible task and resulted in the warning. This shouldn't happen. I will stare to the code some more to see how the heck we get there without passing if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 08:42:02, syzbot wrote: > Hello, > > syzbot has tested the proposed patch but the reproducer still triggered > crash: > WARNING in try_charge > > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, > file-rss:0kB, shmem-rss:0kB > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, > file-rss:0kB, shmem-rss:0kB > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 Thank you. This is useful. The full oom picture is this : [ 65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. oom_victim=0 [...] : [ 65.920355] Task in /ile0 killed as a result of limit of /ile0 : [ 65.926389] memory: usage 0kB, limit 0kB, failcnt 20 : [ 65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : [ 65.963878] Tasks state (memory values in pages): : [ 65.968743] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 65.977615] [ 6410] 0 6410 9427 532614400 0 syz-executor5 : [ 65.986647] Memory cgroup out of memory: Kill process 6410 (syz-executor5) score 547000 or sacrifice child : [ 65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB : [ 66.007471] oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB : [ 66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 : [ 66.025137] [ cut here ] : [ 66.029927] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. : [ 66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 So we have only a single task in the memcg and it is this task which triggers the OOM. It gets killed and oom_reaped. This means that out_of_memory should return with true and so we should retry and force the charge as I've already mentioned. For some reason this task has triggered the oom killer path again and then we haven't found any eligible task and resulted in the warning. This shouldn't happen. I will stare to the code some more to see how the heck we get there without passing if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 16:58:01, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko wrote: > > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: > >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > > [...] > >> >> A much > >> >> friendlier for user way to say this would be print a message at the > >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID > >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack > >> >> trace (does not give any useful info for user). And return EINVAL if > >> >> it can't fly at all? And then leave the "or a kernel bug" part for the > >> >> WARNING each occurrence of which we do want to be reported to kernel > >> >> developers. > >> > > >> > But this is not applicable here. Your misconfiguration is quite obvious > >> > because you simply set the hard limit to 0. This is not the only > >> > situation when this can happen. There is no clear point to tell, you are > >> > doing this wrong. If it was we would do it at that point obviously. > >> > >> But, isn't there a point were hard limit is set to 0? I would expect > >> there is a something like cgroup file write handler with a value of 0 > >> or something. > > > > Yeah, but this is only one instance of the problem. Other is that the > > memcg is not reclaimable for any other reasons. And we do not know what > > those might be > > > >> > >> > If you have a strong reason to believe that this is an abuse of WARN I > >> > am all happy to change that. But I haven't heard any yet, to be honest. > >> > >> WARN must not be used for anything that is not kernel bugs. If this is > >> not kernel bug, WARN must not be used here. > > > > This is rather strong wording without any backing arguments. I strongly > > doubt 90% of existing WARN* match this expectation. WARN* has > > traditionally been a way to tell that something suspicious is going on. > > Those situation are mostly likely not fatal but it is good to know they > > are happening. > > > > Sure there is that panic_on_warn thingy which you seem to be using and I > > suspect it is a reason why you are so careful about warnings in general > > but my experience tells me that this configuration is barely usable > > except for testing (which is your case). > > > > But as I've said, I do not insist on WARN here. All I care about is to > > warn user that something might go south and this may be either due to > > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. > > I am a bit lost. Can limit=0 legally lead to the warnings? Or there is > also a kernel bug on top of that and it's actually a kernel bug that > provokes the warning? As I've tried to tell already. I cannot tell for sure. It is the killed oom victim which triggered thw warning and that shouldn't really happen. Considering this doesn't reproduce with the current linux next nor linus tree and the oom code has changed since the version you have tested then I would suspect there was something wrong with the memcg oom code. But maybe the test doesn't really reproduce reliably. > If it's a kernel bug, then I propose to stop arguing about > configuration and concentrate on the bug. > If it's just the misconfiguration that triggers the warning, then can > we separate the 2 causes of the warning (user misconfiguration and > kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print > a line to console if necessary)? Or if the limit=0 is somehow not > possible/desirable to detect right away, check limit=0 at the point of > the warning and don't want? No we simply cannot. There is numerous situations when this can trigger. Say you set the hard limit to N and then try to fault in shmem file with the size >= N. No oom killer will help to reclaim memory. Or say you migrate the all tasks away from the memcg and then somebody triggers the memcg OOM in that group. There is simply nobody to kill. See the point? There is simply no direct contection between the configuration and actual problem. Too many things might happen between those two points. Let me repeat. We do warn because we want to hear if this happens. WARN tends to be a good way to get that attention. If you strongly believe this is an abuse I won't mind seeing a patch to turn it into something different. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 16:58:01, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko wrote: > > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: > >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > > [...] > >> >> A much > >> >> friendlier for user way to say this would be print a message at the > >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID > >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack > >> >> trace (does not give any useful info for user). And return EINVAL if > >> >> it can't fly at all? And then leave the "or a kernel bug" part for the > >> >> WARNING each occurrence of which we do want to be reported to kernel > >> >> developers. > >> > > >> > But this is not applicable here. Your misconfiguration is quite obvious > >> > because you simply set the hard limit to 0. This is not the only > >> > situation when this can happen. There is no clear point to tell, you are > >> > doing this wrong. If it was we would do it at that point obviously. > >> > >> But, isn't there a point were hard limit is set to 0? I would expect > >> there is a something like cgroup file write handler with a value of 0 > >> or something. > > > > Yeah, but this is only one instance of the problem. Other is that the > > memcg is not reclaimable for any other reasons. And we do not know what > > those might be > > > >> > >> > If you have a strong reason to believe that this is an abuse of WARN I > >> > am all happy to change that. But I haven't heard any yet, to be honest. > >> > >> WARN must not be used for anything that is not kernel bugs. If this is > >> not kernel bug, WARN must not be used here. > > > > This is rather strong wording without any backing arguments. I strongly > > doubt 90% of existing WARN* match this expectation. WARN* has > > traditionally been a way to tell that something suspicious is going on. > > Those situation are mostly likely not fatal but it is good to know they > > are happening. > > > > Sure there is that panic_on_warn thingy which you seem to be using and I > > suspect it is a reason why you are so careful about warnings in general > > but my experience tells me that this configuration is barely usable > > except for testing (which is your case). > > > > But as I've said, I do not insist on WARN here. All I care about is to > > warn user that something might go south and this may be either due to > > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. > > I am a bit lost. Can limit=0 legally lead to the warnings? Or there is > also a kernel bug on top of that and it's actually a kernel bug that > provokes the warning? As I've tried to tell already. I cannot tell for sure. It is the killed oom victim which triggered thw warning and that shouldn't really happen. Considering this doesn't reproduce with the current linux next nor linus tree and the oom code has changed since the version you have tested then I would suspect there was something wrong with the memcg oom code. But maybe the test doesn't really reproduce reliably. > If it's a kernel bug, then I propose to stop arguing about > configuration and concentrate on the bug. > If it's just the misconfiguration that triggers the warning, then can > we separate the 2 causes of the warning (user misconfiguration and > kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print > a line to console if necessary)? Or if the limit=0 is somehow not > possible/desirable to detect right away, check limit=0 at the point of > the warning and don't want? No we simply cannot. There is numerous situations when this can trigger. Say you set the hard limit to N and then try to fault in shmem file with the size >= N. No oom killer will help to reclaim memory. Or say you migrate the all tasks away from the memcg and then somebody triggers the memcg OOM in that group. There is simply nobody to kill. See the point? There is simply no direct contection between the configuration and actual problem. Too many things might happen between those two points. Let me repeat. We do warn because we want to hear if this happens. WARN tends to be a good way to get that attention. If you strongly believe this is an abuse I won't mind seeing a patch to turn it into something different. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On 2018/08/07 0:42, syzbot wrote: > Hello, > > syzbot has tested the proposed patch but the reproducer still triggered crash: > WARNING in try_charge > > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, > file-rss:0kB, shmem-rss:0kB > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, > file-rss:0kB, shmem-rss:0kB > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 > [ cut here ] > Memory cgroup charge failed because of no reclaimable memory! This looks like > a misconfiguration or a kernel bug. > WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 mem_cgroup_oom > mm/memcontrol.c:1706 [inline] > WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 > mm/memcontrol.c:2264 > Kernel panic - not syncing: panic_on_warn set ... Michal, this is "mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once." problem which you are refusing at https://www.spinics.net/lists/linux-mm/msg133774.html .
Re: WARNING in try_charge
On 2018/08/07 0:42, syzbot wrote: > Hello, > > syzbot has tested the proposed patch but the reproducer still triggered crash: > WARNING in try_charge > > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, > file-rss:0kB, shmem-rss:0kB > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, > file-rss:0kB, shmem-rss:0kB > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 > [ cut here ] > Memory cgroup charge failed because of no reclaimable memory! This looks like > a misconfiguration or a kernel bug. > WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 mem_cgroup_oom > mm/memcontrol.c:1706 [inline] > WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 > mm/memcontrol.c:2264 > Kernel panic - not syncing: panic_on_warn set ... Michal, this is "mm, oom: task_will_free_mem(current) should ignore MMF_OOM_SKIP for once." problem which you are refusing at https://www.spinics.net/lists/linux-mm/msg133774.html .
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch but the reproducer still triggered crash: WARNING in try_charge Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 [ cut here ] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 mem_cgroup_oom mm/memcontrol.c:1706 [inline] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 mm/memcontrol.c:2264 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 6410 Comm: syz-executor5 Not tainted 4.18.0-rc7-next-20180803+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 panic+0x238/0x4e7 kernel/panic.c:184 __warn.cold.8+0x163/0x1ba kernel/panic.c:536 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:996 RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline] RIP: 0010:try_charge+0x734/0x1680 mm/memcontrol.c:2264 Code: 85 b8 04 00 00 8b b5 c8 fd ff ff 44 89 f2 4c 89 ff e8 00 51 ff ff 84 c0 0f 85 31 08 00 00 48 c7 c7 c0 17 13 87 e8 8c fd 85 ff <0f> 0b 48 8d 95 f8 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00 RSP: 0018:8801be6ef580 EFLAGS: 00010286 RAX: RBX: 8801afab4c00 RCX: RDX: RSI: 816366f1 RDI: 8801be6ef270 RBP: 8801be6ef810 R08: 8801bcf48140 R09: fbfff0ff1238 R10: fbfff0ff1238 R11: 87f891c3 R12: dc00 R13: 8801be6ef7e8 R14: R15: 8801afab4c00 mem_cgroup_try_charge+0x4ff/0xa70 mm/memcontrol.c:5916 mem_cgroup_try_charge_delay+0x1d/0x90 mm/memcontrol.c:5931 do_anonymous_page mm/memory.c:3166 [inline] handle_pte_fault mm/memory.c:3971 [inline] __handle_mm_fault+0x25be/0x4470 mm/memory.c:4097 handle_mm_fault+0x53e/0xc80 mm/memory.c:4134 __do_page_fault+0x620/0xe50 arch/x86/mm/fault.c:1395 do_page_fault+0xf6/0x8c0 arch/x86/mm/fault.c:1470 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1164 RIP: 0033:0x40e33f Code: Bad RIP value. RSP: 002b:7ffe221246e0 EFLAGS: 00010206 RAX: 7fb83d11b000 RBX: 0002 RCX: 00456b7a RDX: 00021000 RSI: 00021000 RDI: RBP: 7ffe221247c0 R08: R09: R10: R11: 0246 R12: 7ffe221248b0 R13: 7fb83d13b700 R14: 0005 R15: 0001 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Tested on: commit: 116b181bb646 Add linux-next specific files for 20180803 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git console output: https://syzkaller.appspot.com/x/log.txt?x=1244786440 kernel config: https://syzkaller.appspot.com/x/.config?x=b4f38be7c2c519d5 compiler: gcc (GCC) 8.0.1 20180413 (experimental) patch: https://syzkaller.appspot.com/x/patch.diff?x=14c5b68c40
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch but the reproducer still triggered crash: WARNING in try_charge Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1 [ cut here ] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 mem_cgroup_oom mm/memcontrol.c:1706 [inline] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 mm/memcontrol.c:2264 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 6410 Comm: syz-executor5 Not tainted 4.18.0-rc7-next-20180803+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 panic+0x238/0x4e7 kernel/panic.c:184 __warn.cold.8+0x163/0x1ba kernel/panic.c:536 report_bug+0x252/0x2d0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:996 RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline] RIP: 0010:try_charge+0x734/0x1680 mm/memcontrol.c:2264 Code: 85 b8 04 00 00 8b b5 c8 fd ff ff 44 89 f2 4c 89 ff e8 00 51 ff ff 84 c0 0f 85 31 08 00 00 48 c7 c7 c0 17 13 87 e8 8c fd 85 ff <0f> 0b 48 8d 95 f8 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00 RSP: 0018:8801be6ef580 EFLAGS: 00010286 RAX: RBX: 8801afab4c00 RCX: RDX: RSI: 816366f1 RDI: 8801be6ef270 RBP: 8801be6ef810 R08: 8801bcf48140 R09: fbfff0ff1238 R10: fbfff0ff1238 R11: 87f891c3 R12: dc00 R13: 8801be6ef7e8 R14: R15: 8801afab4c00 mem_cgroup_try_charge+0x4ff/0xa70 mm/memcontrol.c:5916 mem_cgroup_try_charge_delay+0x1d/0x90 mm/memcontrol.c:5931 do_anonymous_page mm/memory.c:3166 [inline] handle_pte_fault mm/memory.c:3971 [inline] __handle_mm_fault+0x25be/0x4470 mm/memory.c:4097 handle_mm_fault+0x53e/0xc80 mm/memory.c:4134 __do_page_fault+0x620/0xe50 arch/x86/mm/fault.c:1395 do_page_fault+0xf6/0x8c0 arch/x86/mm/fault.c:1470 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1164 RIP: 0033:0x40e33f Code: Bad RIP value. RSP: 002b:7ffe221246e0 EFLAGS: 00010206 RAX: 7fb83d11b000 RBX: 0002 RCX: 00456b7a RDX: 00021000 RSI: 00021000 RDI: RBP: 7ffe221247c0 R08: R09: R10: R11: 0246 R12: 7ffe221248b0 R13: 7fb83d13b700 R14: 0005 R15: 0001 Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: disabled Rebooting in 86400 seconds.. Tested on: commit: 116b181bb646 Add linux-next specific files for 20180803 git tree: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git console output: https://syzkaller.appspot.com/x/log.txt?x=1244786440 kernel config: https://syzkaller.appspot.com/x/.config?x=b4f38be7c2c519d5 compiler: gcc (GCC) 8.0.1 20180413 (experimental) patch: https://syzkaller.appspot.com/x/patch.diff?x=14c5b68c40
Re: WARNING in try_charge
Now, let's test next-20180803 . #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 116b181bb646afedd770985de20a68721bdb2648 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..852cd3dbdcd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(); mutex_unlock(_lock); return ret;
Re: WARNING in try_charge
Now, let's test next-20180803 . #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 116b181bb646afedd770985de20a68721bdb2648 diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..852cd3dbdcd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(); mutex_unlock(_lock); return ret;
Re: WARNING in try_charge
On Mon, Aug 06, 2018 at 04:21:24PM +0200, Michal Hocko wrote: > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: > > On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > > > If you have a strong reason to believe that this is an abuse of WARN I > > > am all happy to change that. But I haven't heard any yet, to be honest. > > > > WARN must not be used for anything that is not kernel bugs. If this is > > not kernel bug, WARN must not be used here. > > This is rather strong wording without any backing arguments. I strongly > doubt 90% of existing WARN* match this expectation. WARN* has > traditionally been a way to tell that something suspicious is going on. > Those situation are mostly likely not fatal but it is good to know they > are happening. I have to agree with Dmitry here. WARN should indicate a real kernel issue, not user input that knowingly triggers undesirable behavior in the kernel. It's our assert() for states we don't think are possible. I would wager that MOST developers and users understand it that way.
Re: WARNING in try_charge
On Mon, Aug 06, 2018 at 04:21:24PM +0200, Michal Hocko wrote: > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: > > On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > > > If you have a strong reason to believe that this is an abuse of WARN I > > > am all happy to change that. But I haven't heard any yet, to be honest. > > > > WARN must not be used for anything that is not kernel bugs. If this is > > not kernel bug, WARN must not be used here. > > This is rather strong wording without any backing arguments. I strongly > doubt 90% of existing WARN* match this expectation. WARN* has > traditionally been a way to tell that something suspicious is going on. > Those situation are mostly likely not fatal but it is good to know they > are happening. I have to agree with Dmitry here. WARN should indicate a real kernel issue, not user input that knowingly triggers undesirable behavior in the kernel. It's our assert() for states we don't think are possible. I would wager that MOST developers and users understand it that way.
Re: WARNING in try_charge
On 2018/08/06 23:58, Michal Hocko wrote: >> Since I can't find mm related changes between next-20180803 (syzbot can >> reproduce) and >> next-20180806 (syzbot has not reproduced), I can't guess what makes this >> problem go away. > > Hmm, but original report was against 4.18.0-rc6-next-20180725+ kernel. > And that one had the old group oom code. /me confused. > Yes. But I confirmed that syzbot can reproduce this problem with next-20180803 which already dropped the old group oom code. Therefore, I think that syzbot is hitting a problem other than the old group oom code.
Re: WARNING in try_charge
On 2018/08/06 23:58, Michal Hocko wrote: >> Since I can't find mm related changes between next-20180803 (syzbot can >> reproduce) and >> next-20180806 (syzbot has not reproduced), I can't guess what makes this >> problem go away. > > Hmm, but original report was against 4.18.0-rc6-next-20180725+ kernel. > And that one had the old group oom code. /me confused. > Yes. But I confirmed that syzbot can reproduce this problem with next-20180803 which already dropped the old group oom code. Therefore, I think that syzbot is hitting a problem other than the old group oom code.
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko wrote: > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > [...] >> >> A much >> >> friendlier for user way to say this would be print a message at the >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack >> >> trace (does not give any useful info for user). And return EINVAL if >> >> it can't fly at all? And then leave the "or a kernel bug" part for the >> >> WARNING each occurrence of which we do want to be reported to kernel >> >> developers. >> > >> > But this is not applicable here. Your misconfiguration is quite obvious >> > because you simply set the hard limit to 0. This is not the only >> > situation when this can happen. There is no clear point to tell, you are >> > doing this wrong. If it was we would do it at that point obviously. >> >> But, isn't there a point were hard limit is set to 0? I would expect >> there is a something like cgroup file write handler with a value of 0 >> or something. > > Yeah, but this is only one instance of the problem. Other is that the > memcg is not reclaimable for any other reasons. And we do not know what > those might be > >> >> > If you have a strong reason to believe that this is an abuse of WARN I >> > am all happy to change that. But I haven't heard any yet, to be honest. >> >> WARN must not be used for anything that is not kernel bugs. If this is >> not kernel bug, WARN must not be used here. > > This is rather strong wording without any backing arguments. I strongly > doubt 90% of existing WARN* match this expectation. WARN* has > traditionally been a way to tell that something suspicious is going on. > Those situation are mostly likely not fatal but it is good to know they > are happening. Today syzbot covers about 1M lines of kernel code, and we fuzz for several years with panic_on_warn=1 and each unique crash is recorded and reported. Over several thousands bugs that we reported, there were maybe 2 dozens of such cases (WARN on invalid user inputs, ENOMEM, etc). The solution always was to remove the WARNING on covert to pr_err. As of now, I see only 2 such cases open: this one and WARN on ENOMEM in input subsystem. Either way, we do badly need this separation. If there are deviations we need to continue fixing them.
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko wrote: > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > [...] >> >> A much >> >> friendlier for user way to say this would be print a message at the >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack >> >> trace (does not give any useful info for user). And return EINVAL if >> >> it can't fly at all? And then leave the "or a kernel bug" part for the >> >> WARNING each occurrence of which we do want to be reported to kernel >> >> developers. >> > >> > But this is not applicable here. Your misconfiguration is quite obvious >> > because you simply set the hard limit to 0. This is not the only >> > situation when this can happen. There is no clear point to tell, you are >> > doing this wrong. If it was we would do it at that point obviously. >> >> But, isn't there a point were hard limit is set to 0? I would expect >> there is a something like cgroup file write handler with a value of 0 >> or something. > > Yeah, but this is only one instance of the problem. Other is that the > memcg is not reclaimable for any other reasons. And we do not know what > those might be > >> >> > If you have a strong reason to believe that this is an abuse of WARN I >> > am all happy to change that. But I haven't heard any yet, to be honest. >> >> WARN must not be used for anything that is not kernel bugs. If this is >> not kernel bug, WARN must not be used here. > > This is rather strong wording without any backing arguments. I strongly > doubt 90% of existing WARN* match this expectation. WARN* has > traditionally been a way to tell that something suspicious is going on. > Those situation are mostly likely not fatal but it is good to know they > are happening. Today syzbot covers about 1M lines of kernel code, and we fuzz for several years with panic_on_warn=1 and each unique crash is recorded and reported. Over several thousands bugs that we reported, there were maybe 2 dozens of such cases (WARN on invalid user inputs, ENOMEM, etc). The solution always was to remove the WARNING on covert to pr_err. As of now, I see only 2 such cases open: this one and WARN on ENOMEM in input subsystem. Either way, we do badly need this separation. If there are deviations we need to continue fixing them.
Re: WARNING in try_charge
On 2018/08/06 23:54, David Howells wrote: > Do you have a link to the problem? > > David > https://groups.google.com/forum/#!msg/syzkaller-bugs/R03vI7RCVco/0PijCTrcCgAJ syzbot found a reproducer, and the reproducer was working until next-20180803. But the reproducer is failing to reproduce this problem in next-20180806 despite there is no mm related change between next-20180803 and next-20180806. Therefore, I suspect that the reproducer is no longer working as intended. And there was parser change (your patch) between next-20180803 and next-20180806.
Re: WARNING in try_charge
On 2018/08/06 23:54, David Howells wrote: > Do you have a link to the problem? > > David > https://groups.google.com/forum/#!msg/syzkaller-bugs/R03vI7RCVco/0PijCTrcCgAJ syzbot found a reproducer, and the reproducer was working until next-20180803. But the reproducer is failing to reproduce this problem in next-20180806 despite there is no mm related change between next-20180803 and next-20180806. Therefore, I suspect that the reproducer is no longer working as intended. And there was parser change (your patch) between next-20180803 and next-20180806.
Re: WARNING in try_charge
On Mon 06-08-18 23:41:22, Tetsuo Handa wrote: > +David Howells > > On 2018/08/06 20:32, Michal Hocko wrote: > > On Mon 06-08-18 04:27:02, syzbot wrote: > >> Hello, > >> > >> syzbot has tested the proposed patch and the reproducer did not trigger > >> crash: > >> > >> Reported-and-tested-by: > >> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com > >> > >> Tested on: > >> > >> commit: 8c8399e0a3fb Add linux-next specific files for 20180806 > >> git tree: linux-next > >> kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e > >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > >> patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 > >> > >> Note: testing is done by a robot and is best-effort only. > > > > OK, so this smells like a problem in the previous group oom changes. Or > > maybe it is not very easy to reproduce? > > > > Since I can't find mm related changes between next-20180803 (syzbot can > reproduce) and > next-20180806 (syzbot has not reproduced), I can't guess what makes this > problem go away. Hmm, but original report was against 4.18.0-rc6-next-20180725+ kernel. And that one had the old group oom code. /me confused. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 23:41:22, Tetsuo Handa wrote: > +David Howells > > On 2018/08/06 20:32, Michal Hocko wrote: > > On Mon 06-08-18 04:27:02, syzbot wrote: > >> Hello, > >> > >> syzbot has tested the proposed patch and the reproducer did not trigger > >> crash: > >> > >> Reported-and-tested-by: > >> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com > >> > >> Tested on: > >> > >> commit: 8c8399e0a3fb Add linux-next specific files for 20180806 > >> git tree: linux-next > >> kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e > >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) > >> patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 > >> > >> Note: testing is done by a robot and is best-effort only. > > > > OK, so this smells like a problem in the previous group oom changes. Or > > maybe it is not very easy to reproduce? > > > > Since I can't find mm related changes between next-20180803 (syzbot can > reproduce) and > next-20180806 (syzbot has not reproduced), I can't guess what makes this > problem go away. Hmm, but original report was against 4.18.0-rc6-next-20180725+ kernel. And that one had the old group oom code. /me confused. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko wrote: > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > [...] >> >> A much >> >> friendlier for user way to say this would be print a message at the >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack >> >> trace (does not give any useful info for user). And return EINVAL if >> >> it can't fly at all? And then leave the "or a kernel bug" part for the >> >> WARNING each occurrence of which we do want to be reported to kernel >> >> developers. >> > >> > But this is not applicable here. Your misconfiguration is quite obvious >> > because you simply set the hard limit to 0. This is not the only >> > situation when this can happen. There is no clear point to tell, you are >> > doing this wrong. If it was we would do it at that point obviously. >> >> But, isn't there a point were hard limit is set to 0? I would expect >> there is a something like cgroup file write handler with a value of 0 >> or something. > > Yeah, but this is only one instance of the problem. Other is that the > memcg is not reclaimable for any other reasons. And we do not know what > those might be > >> >> > If you have a strong reason to believe that this is an abuse of WARN I >> > am all happy to change that. But I haven't heard any yet, to be honest. >> >> WARN must not be used for anything that is not kernel bugs. If this is >> not kernel bug, WARN must not be used here. > > This is rather strong wording without any backing arguments. I strongly > doubt 90% of existing WARN* match this expectation. WARN* has > traditionally been a way to tell that something suspicious is going on. > Those situation are mostly likely not fatal but it is good to know they > are happening. > > Sure there is that panic_on_warn thingy which you seem to be using and I > suspect it is a reason why you are so careful about warnings in general > but my experience tells me that this configuration is barely usable > except for testing (which is your case). > > But as I've said, I do not insist on WARN here. All I care about is to > warn user that something might go south and this may be either due to > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. I am a bit lost. Can limit=0 legally lead to the warnings? Or there is also a kernel bug on top of that and it's actually a kernel bug that provokes the warning? If it's a kernel bug, then I propose to stop arguing about configuration and concentrate on the bug. If it's just the misconfiguration that triggers the warning, then can we separate the 2 causes of the warning (user misconfiguration and kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print a line to console if necessary)? Or if the limit=0 is somehow not possible/desirable to detect right away, check limit=0 at the point of the warning and don't want?
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko wrote: > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: > [...] >> >> A much >> >> friendlier for user way to say this would be print a message at the >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack >> >> trace (does not give any useful info for user). And return EINVAL if >> >> it can't fly at all? And then leave the "or a kernel bug" part for the >> >> WARNING each occurrence of which we do want to be reported to kernel >> >> developers. >> > >> > But this is not applicable here. Your misconfiguration is quite obvious >> > because you simply set the hard limit to 0. This is not the only >> > situation when this can happen. There is no clear point to tell, you are >> > doing this wrong. If it was we would do it at that point obviously. >> >> But, isn't there a point were hard limit is set to 0? I would expect >> there is a something like cgroup file write handler with a value of 0 >> or something. > > Yeah, but this is only one instance of the problem. Other is that the > memcg is not reclaimable for any other reasons. And we do not know what > those might be > >> >> > If you have a strong reason to believe that this is an abuse of WARN I >> > am all happy to change that. But I haven't heard any yet, to be honest. >> >> WARN must not be used for anything that is not kernel bugs. If this is >> not kernel bug, WARN must not be used here. > > This is rather strong wording without any backing arguments. I strongly > doubt 90% of existing WARN* match this expectation. WARN* has > traditionally been a way to tell that something suspicious is going on. > Those situation are mostly likely not fatal but it is good to know they > are happening. > > Sure there is that panic_on_warn thingy which you seem to be using and I > suspect it is a reason why you are so careful about warnings in general > but my experience tells me that this configuration is barely usable > except for testing (which is your case). > > But as I've said, I do not insist on WARN here. All I care about is to > warn user that something might go south and this may be either due to > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. I am a bit lost. Can limit=0 legally lead to the warnings? Or there is also a kernel bug on top of that and it's actually a kernel bug that provokes the warning? If it's a kernel bug, then I propose to stop arguing about configuration and concentrate on the bug. If it's just the misconfiguration that triggers the warning, then can we separate the 2 causes of the warning (user misconfiguration and kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print a line to console if necessary)? Or if the limit=0 is somehow not possible/desirable to detect right away, check limit=0 at the point of the warning and don't want?
Re: WARNING in try_charge
Do you have a link to the problem? David
Re: WARNING in try_charge
Do you have a link to the problem? David
Re: WARNING in try_charge
+David Howells On 2018/08/06 20:32, Michal Hocko wrote: > On Mon 06-08-18 04:27:02, syzbot wrote: >> Hello, >> >> syzbot has tested the proposed patch and the reproducer did not trigger >> crash: >> >> Reported-and-tested-by: >> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com >> >> Tested on: >> >> commit: 8c8399e0a3fb Add linux-next specific files for 20180806 >> git tree: linux-next >> kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 >> >> Note: testing is done by a robot and is best-effort only. > > OK, so this smells like a problem in the previous group oom changes. Or > maybe it is not very easy to reproduce? > Since I can't find mm related changes between next-20180803 (syzbot can reproduce) and next-20180806 (syzbot has not reproduced), I can't guess what makes this problem go away. But since this problem did not occur for 3.5 hours on next-20180806 (when this problem was occurring once per 60-90 minutes), the reproducer might not be working as intended due to "kernfs, sysfs, cgroup, intel_rdt: Support fs_context" or something... ./kernel/cgroup/cgroup-internal.h | 3 ./kernel/cgroup/cgroup-v1.c| 211 ./kernel/cgroup/cgroup.c | 81
Re: WARNING in try_charge
+David Howells On 2018/08/06 20:32, Michal Hocko wrote: > On Mon 06-08-18 04:27:02, syzbot wrote: >> Hello, >> >> syzbot has tested the proposed patch and the reproducer did not trigger >> crash: >> >> Reported-and-tested-by: >> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com >> >> Tested on: >> >> commit: 8c8399e0a3fb Add linux-next specific files for 20180806 >> git tree: linux-next >> kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 >> >> Note: testing is done by a robot and is best-effort only. > > OK, so this smells like a problem in the previous group oom changes. Or > maybe it is not very easy to reproduce? > Since I can't find mm related changes between next-20180803 (syzbot can reproduce) and next-20180806 (syzbot has not reproduced), I can't guess what makes this problem go away. But since this problem did not occur for 3.5 hours on next-20180806 (when this problem was occurring once per 60-90 minutes), the reproducer might not be working as intended due to "kernfs, sysfs, cgroup, intel_rdt: Support fs_context" or something... ./kernel/cgroup/cgroup-internal.h | 3 ./kernel/cgroup/cgroup-v1.c| 211 ./kernel/cgroup/cgroup.c | 81
Re: WARNING in try_charge
On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: [...] > >> A much > >> friendlier for user way to say this would be print a message at the > >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID > >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack > >> trace (does not give any useful info for user). And return EINVAL if > >> it can't fly at all? And then leave the "or a kernel bug" part for the > >> WARNING each occurrence of which we do want to be reported to kernel > >> developers. > > > > But this is not applicable here. Your misconfiguration is quite obvious > > because you simply set the hard limit to 0. This is not the only > > situation when this can happen. There is no clear point to tell, you are > > doing this wrong. If it was we would do it at that point obviously. > > But, isn't there a point were hard limit is set to 0? I would expect > there is a something like cgroup file write handler with a value of 0 > or something. Yeah, but this is only one instance of the problem. Other is that the memcg is not reclaimable for any other reasons. And we do not know what those might be > > > If you have a strong reason to believe that this is an abuse of WARN I > > am all happy to change that. But I haven't heard any yet, to be honest. > > WARN must not be used for anything that is not kernel bugs. If this is > not kernel bug, WARN must not be used here. This is rather strong wording without any backing arguments. I strongly doubt 90% of existing WARN* match this expectation. WARN* has traditionally been a way to tell that something suspicious is going on. Those situation are mostly likely not fatal but it is good to know they are happening. Sure there is that panic_on_warn thingy which you seem to be using and I suspect it is a reason why you are so careful about warnings in general but my experience tells me that this configuration is barely usable except for testing (which is your case). But as I've said, I do not insist on WARN here. All I care about is to warn user that something might go south and this may be either due to misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: [...] > >> A much > >> friendlier for user way to say this would be print a message at the > >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID > >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack > >> trace (does not give any useful info for user). And return EINVAL if > >> it can't fly at all? And then leave the "or a kernel bug" part for the > >> WARNING each occurrence of which we do want to be reported to kernel > >> developers. > > > > But this is not applicable here. Your misconfiguration is quite obvious > > because you simply set the hard limit to 0. This is not the only > > situation when this can happen. There is no clear point to tell, you are > > doing this wrong. If it was we would do it at that point obviously. > > But, isn't there a point were hard limit is set to 0? I would expect > there is a something like cgroup file write handler with a value of 0 > or something. Yeah, but this is only one instance of the problem. Other is that the memcg is not reclaimable for any other reasons. And we do not know what those might be > > > If you have a strong reason to believe that this is an abuse of WARN I > > am all happy to change that. But I haven't heard any yet, to be honest. > > WARN must not be used for anything that is not kernel bugs. If this is > not kernel bug, WARN must not be used here. This is rather strong wording without any backing arguments. I strongly doubt 90% of existing WARN* match this expectation. WARN* has traditionally been a way to tell that something suspicious is going on. Those situation are mostly likely not fatal but it is good to know they are happening. Sure there is that panic_on_warn thingy which you seem to be using and I suspect it is a reason why you are so careful about warnings in general but my experience tells me that this configuration is barely usable except for testing (which is your case). But as I've said, I do not insist on WARN here. All I care about is to warn user that something might go south and this may be either due to misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 1:32 PM, Michal Hocko wrote: > On Mon 06-08-18 04:27:02, syzbot wrote: >> Hello, >> >> syzbot has tested the proposed patch and the reproducer did not trigger >> crash: >> >> Reported-and-tested-by: >> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com >> >> Tested on: >> >> commit: 8c8399e0a3fb Add linux-next specific files for 20180806 >> git tree: linux-next >> kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 >> >> Note: testing is done by a robot and is best-effort only. > > OK, so this smells like a problem in the previous group oom changes. Or > maybe it is not very easy to reproduce? It's possible to ask syzbot to test on a particular tree/commit too.
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 1:32 PM, Michal Hocko wrote: > On Mon 06-08-18 04:27:02, syzbot wrote: >> Hello, >> >> syzbot has tested the proposed patch and the reproducer did not trigger >> crash: >> >> Reported-and-tested-by: >> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com >> >> Tested on: >> >> commit: 8c8399e0a3fb Add linux-next specific files for 20180806 >> git tree: linux-next >> kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 >> >> Note: testing is done by a robot and is best-effort only. > > OK, so this smells like a problem in the previous group oom changes. Or > maybe it is not very easy to reproduce? It's possible to ask syzbot to test on a particular tree/commit too.
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: >> >> > More interesting stuff is higher in the kernel log >> >> > : [ 366.435015] >> >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 >> >> > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 >> >> > >> >> > Are you sure you want to have hard limit set to 0? >> >> >> >> syzkaller really does not mind to have it. >> > >> > So what do you use it for? What do you actually test by this setting? >> >> syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is >> doable from user-space. Some of that may not make sense, but it does >> not matter because kernel should still stand still. > > I am not questioning that. What I am saying is that the configuration > doesn't make much sense and the kernel warns about it. > >> > [...] >> >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> >> > index 4603ad75c9a9..852cd3dbdcd9 100644 >> >> > --- a/mm/memcontrol.c >> >> > +++ b/mm/memcontrol.c >> >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct >> >> > mem_cgroup *memcg, gfp_t gfp_mask, >> >> > bool ret; >> >> > >> >> > mutex_lock(_lock); >> >> > + pr_info("task=%s pid=%d invoked memcg oom killer. >> >> > oom_victim=%d\n", >> >> > + current->comm, current->pid, >> >> > tsk_is_oom_victim(current)); >> >> > ret = out_of_memory(); >> >> > mutex_unlock(_lock); >> >> > return ret; >> >> > >> >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit >> >> > and basically no memory charged by existing tasks is not going to fly >> >> > and the warning is exactly to call that out. >> >> >> >> >> >> Please-please-please do not mix kernel bugs and notices to user into >> >> the same bucket: >> > >> > Well, WARN_ON used to be a standard way to make user aware of a >> > misbehavior. In this case it warns about a pottential runaway when memcg >> > is misconfigured. I do not insist on using WARN_ON here of course. If >> > there is a general agreement that such a condition is better handled by >> > pr_err then I am fine with it. Users tend to be more sensitive on >> > WARN_ONs though. >> >> The docs change was acked by Greg, and Andrew took it into mm, Linus >> was CCed too. It missed the release because I guess it's comments only >> change, but otherwise it should reach upstream tree on the next merge >> window. >> >> WARN is _not_ a common way to notify users today. syzbot reports _all_ >> WARN occurrences and you can see there are not many of them now >> (probably 1 another now, +dtor for that one): >> https://syzkaller.appspot.com#upstream >> There is probably some long tail that we need to fix. We really do >> want systematic testing capability. You do not want every of 2 billion >> linux users to come to you with this kernel splat, just so that you >> can explain to them that it's some programs of their machines doing >> something wrong, right? > > [This is an orthogonal discussion I believe] > > How does it differ from pr_err though? pr_err output looks very different, says that it is the user program that does something wrong, explains what exactly is done wrong and does not contain traces, offsets of hex numbers that scare end users. WARN for kernel bugs/pr_err for user separation allows to easily understand (including by computer programs) if it is a kernel bugs (something to notify kernel developers about) or not. In particular if it would be the case here, we would not have this bug report and would not spent time here. >> WARN is really a bad way to inform a user about something. Consider a >> non-kernel developer, perhaps even non-programmer. What they see is >> "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 >> try_charge+0x734/0x1680" followed by some obscure things and hex >> numbers. File:line reference is pointless, they don't what what/where >> it is. > > Well, you get a precise location where the problem happens and the > backtrace to see how it happened. This is much more information than, > pr_err without dump_stack. This information is important for kernel bugs, but not for invalid values passed by user programs. For the latter the exact location where the problem happened is not there, it's in some user program. For the latter it is more important to explain in readable language what exactly arguments to what call were wrong. Say, if you enter a wrong pin in ATM, it says "you entered wrong pin" and not dumps some internal state in hex. >> This one is slightly better because it prints "Memory cgroup >> charge failed because of no reclaimable memory! This looks like a >> misconfiguration or a kernel bug." before the warning. But still it >> says "or a kernel bug", which means that they will come to you. > > Yeah, and that was the purpose of the thing. Why? Was it clear how exactly if can happen? Can we refine it
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko wrote: >> >> > More interesting stuff is higher in the kernel log >> >> > : [ 366.435015] >> >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 >> >> > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 >> >> > >> >> > Are you sure you want to have hard limit set to 0? >> >> >> >> syzkaller really does not mind to have it. >> > >> > So what do you use it for? What do you actually test by this setting? >> >> syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is >> doable from user-space. Some of that may not make sense, but it does >> not matter because kernel should still stand still. > > I am not questioning that. What I am saying is that the configuration > doesn't make much sense and the kernel warns about it. > >> > [...] >> >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> >> > index 4603ad75c9a9..852cd3dbdcd9 100644 >> >> > --- a/mm/memcontrol.c >> >> > +++ b/mm/memcontrol.c >> >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct >> >> > mem_cgroup *memcg, gfp_t gfp_mask, >> >> > bool ret; >> >> > >> >> > mutex_lock(_lock); >> >> > + pr_info("task=%s pid=%d invoked memcg oom killer. >> >> > oom_victim=%d\n", >> >> > + current->comm, current->pid, >> >> > tsk_is_oom_victim(current)); >> >> > ret = out_of_memory(); >> >> > mutex_unlock(_lock); >> >> > return ret; >> >> > >> >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit >> >> > and basically no memory charged by existing tasks is not going to fly >> >> > and the warning is exactly to call that out. >> >> >> >> >> >> Please-please-please do not mix kernel bugs and notices to user into >> >> the same bucket: >> > >> > Well, WARN_ON used to be a standard way to make user aware of a >> > misbehavior. In this case it warns about a pottential runaway when memcg >> > is misconfigured. I do not insist on using WARN_ON here of course. If >> > there is a general agreement that such a condition is better handled by >> > pr_err then I am fine with it. Users tend to be more sensitive on >> > WARN_ONs though. >> >> The docs change was acked by Greg, and Andrew took it into mm, Linus >> was CCed too. It missed the release because I guess it's comments only >> change, but otherwise it should reach upstream tree on the next merge >> window. >> >> WARN is _not_ a common way to notify users today. syzbot reports _all_ >> WARN occurrences and you can see there are not many of them now >> (probably 1 another now, +dtor for that one): >> https://syzkaller.appspot.com#upstream >> There is probably some long tail that we need to fix. We really do >> want systematic testing capability. You do not want every of 2 billion >> linux users to come to you with this kernel splat, just so that you >> can explain to them that it's some programs of their machines doing >> something wrong, right? > > [This is an orthogonal discussion I believe] > > How does it differ from pr_err though? pr_err output looks very different, says that it is the user program that does something wrong, explains what exactly is done wrong and does not contain traces, offsets of hex numbers that scare end users. WARN for kernel bugs/pr_err for user separation allows to easily understand (including by computer programs) if it is a kernel bugs (something to notify kernel developers about) or not. In particular if it would be the case here, we would not have this bug report and would not spent time here. >> WARN is really a bad way to inform a user about something. Consider a >> non-kernel developer, perhaps even non-programmer. What they see is >> "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 >> try_charge+0x734/0x1680" followed by some obscure things and hex >> numbers. File:line reference is pointless, they don't what what/where >> it is. > > Well, you get a precise location where the problem happens and the > backtrace to see how it happened. This is much more information than, > pr_err without dump_stack. This information is important for kernel bugs, but not for invalid values passed by user programs. For the latter the exact location where the problem happened is not there, it's in some user program. For the latter it is more important to explain in readable language what exactly arguments to what call were wrong. Say, if you enter a wrong pin in ATM, it says "you entered wrong pin" and not dumps some internal state in hex. >> This one is slightly better because it prints "Memory cgroup >> charge failed because of no reclaimable memory! This looks like a >> misconfiguration or a kernel bug." before the warning. But still it >> says "or a kernel bug", which means that they will come to you. > > Yeah, and that was the purpose of the thing. Why? Was it clear how exactly if can happen? Can we refine it
Re: WARNING in try_charge
On Mon 06-08-18 04:27:02, syzbot wrote: > Hello, > > syzbot has tested the proposed patch and the reproducer did not trigger > crash: > > Reported-and-tested-by: > syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com > > Tested on: > > commit: 8c8399e0a3fb Add linux-next specific files for 20180806 > git tree: linux-next > kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 > > Note: testing is done by a robot and is best-effort only. OK, so this smells like a problem in the previous group oom changes. Or maybe it is not very easy to reproduce? -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 04:27:02, syzbot wrote: > Hello, > > syzbot has tested the proposed patch and the reproducer did not trigger > crash: > > Reported-and-tested-by: > syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com > > Tested on: > > commit: 8c8399e0a3fb Add linux-next specific files for 20180806 > git tree: linux-next > kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 > > Note: testing is done by a robot and is best-effort only. OK, so this smells like a problem in the previous group oom changes. Or maybe it is not very easy to reproduce? -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch and the reproducer did not trigger crash: Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com Tested on: commit: 8c8399e0a3fb Add linux-next specific files for 20180806 git tree: linux-next kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e compiler: gcc (GCC) 8.0.1 20180413 (experimental) patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 Note: testing is done by a robot and is best-effort only.
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch and the reproducer did not trigger crash: Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com Tested on: commit: 8c8399e0a3fb Add linux-next specific files for 20180806 git tree: linux-next kernel config: https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e compiler: gcc (GCC) 8.0.1 20180413 (experimental) patch: https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240 Note: testing is done by a robot and is best-effort only.
Re: WARNING in try_charge
On Mon 06-08-18 19:47:00, Tetsuo Handa wrote: > On 2018/08/06 19:39, Dmitry Vyukov wrote: > > On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: > >> Btw. running with the above diff on top might help us to ideantify > >> whether this is a pre-mature warning or a valid one. Still useful to > >> find out. > > Since syzbot already found a syz reproducer, you can ask syzbot to test it. > > > > > The bug report has a reproducer, so you can run it with the patch. Or > > ask syzbot to test your patch: > > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches > > Which basically boils down to saying: > > > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > master > > Excuse me, but this is linux-next only problem. Therefore, If this really is a linux-next only problem then please retest with the current linux-next which has dropped the and replaced the group oom code. > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > master > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..852cd3dbdcd9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup > *memcg, gfp_t gfp_mask, > bool ret; > > mutex_lock(_lock); > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", > + current->comm, current->pid, > tsk_is_oom_victim(current)); > ret = out_of_memory(); > mutex_unlock(_lock); > return ret; > > F.Y.I. Waiting until __mmput() completes (with timeout using OOM score > feedback) > ( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solves this > race. Which just means that something else is broken. Seriously, timout is not going to fix anything. It merely changes the picture. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 19:47:00, Tetsuo Handa wrote: > On 2018/08/06 19:39, Dmitry Vyukov wrote: > > On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: > >> Btw. running with the above diff on top might help us to ideantify > >> whether this is a pre-mature warning or a valid one. Still useful to > >> find out. > > Since syzbot already found a syz reproducer, you can ask syzbot to test it. > > > > > The bug report has a reproducer, so you can run it with the patch. Or > > ask syzbot to test your patch: > > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches > > Which basically boils down to saying: > > > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > master > > Excuse me, but this is linux-next only problem. Therefore, If this really is a linux-next only problem then please retest with the current linux-next which has dropped the and replaced the group oom code. > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > master > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 4603ad75c9a9..852cd3dbdcd9 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup > *memcg, gfp_t gfp_mask, > bool ret; > > mutex_lock(_lock); > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", > + current->comm, current->pid, > tsk_is_oom_victim(current)); > ret = out_of_memory(); > mutex_unlock(_lock); > return ret; > > F.Y.I. Waiting until __mmput() completes (with timeout using OOM score > feedback) > ( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solves this > race. Which just means that something else is broken. Seriously, timout is not going to fix anything. It merely changes the picture. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 12:34:30, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: > > On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote: > >> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: > > [...] > >> > More interesting stuff is higher in the kernel log > >> > : [ 366.435015] > >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 > >> > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 > >> > > >> > Are you sure you want to have hard limit set to 0? > >> > >> syzkaller really does not mind to have it. > > > > So what do you use it for? What do you actually test by this setting? > > syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is > doable from user-space. Some of that may not make sense, but it does > not matter because kernel should still stand still. I am not questioning that. What I am saying is that the configuration doesn't make much sense and the kernel warns about it. > > [...] > >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> > index 4603ad75c9a9..852cd3dbdcd9 100644 > >> > --- a/mm/memcontrol.c > >> > +++ b/mm/memcontrol.c > >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct > >> > mem_cgroup *memcg, gfp_t gfp_mask, > >> > bool ret; > >> > > >> > mutex_lock(_lock); > >> > + pr_info("task=%s pid=%d invoked memcg oom killer. > >> > oom_victim=%d\n", > >> > + current->comm, current->pid, > >> > tsk_is_oom_victim(current)); > >> > ret = out_of_memory(); > >> > mutex_unlock(_lock); > >> > return ret; > >> > > >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit > >> > and basically no memory charged by existing tasks is not going to fly > >> > and the warning is exactly to call that out. > >> > >> > >> Please-please-please do not mix kernel bugs and notices to user into > >> the same bucket: > > > > Well, WARN_ON used to be a standard way to make user aware of a > > misbehavior. In this case it warns about a pottential runaway when memcg > > is misconfigured. I do not insist on using WARN_ON here of course. If > > there is a general agreement that such a condition is better handled by > > pr_err then I am fine with it. Users tend to be more sensitive on > > WARN_ONs though. > > The docs change was acked by Greg, and Andrew took it into mm, Linus > was CCed too. It missed the release because I guess it's comments only > change, but otherwise it should reach upstream tree on the next merge > window. > > WARN is _not_ a common way to notify users today. syzbot reports _all_ > WARN occurrences and you can see there are not many of them now > (probably 1 another now, +dtor for that one): > https://syzkaller.appspot.com#upstream > There is probably some long tail that we need to fix. We really do > want systematic testing capability. You do not want every of 2 billion > linux users to come to you with this kernel splat, just so that you > can explain to them that it's some programs of their machines doing > something wrong, right? [This is an orthogonal discussion I believe] How does it differ from pr_err though? > WARN is really a bad way to inform a user about something. Consider a > non-kernel developer, perhaps even non-programmer. What they see is > "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 > try_charge+0x734/0x1680" followed by some obscure things and hex > numbers. File:line reference is pointless, they don't what what/where > it is. Well, you get a precise location where the problem happens and the backtrace to see how it happened. This is much more information than, pr_err without dump_stack. > This one is slightly better because it prints "Memory cgroup > charge failed because of no reclaimable memory! This looks like a > misconfiguration or a kernel bug." before the warning. But still it > says "or a kernel bug", which means that they will come to you. Yeah, and that was the purpose of the thing. > A much > friendlier for user way to say this would be print a message at the > point of misconfiguration saying what exactly is wrong, e.g. "pid $PID > misconfigures cgroup /cgroup/path with mem.limit=0" without a stack > trace (does not give any useful info for user). And return EINVAL if > it can't fly at all? And then leave the "or a kernel bug" part for the > WARNING each occurrence of which we do want to be reported to kernel > developers. But this is not applicable here. Your misconfiguration is quite obvious because you simply set the hard limit to 0. This is not the only situation when this can happen. There is no clear point to tell, you are doing this wrong. If it was we would do it at that point obviously. If you have a strong reason to believe that this is an abuse of WARN I am all happy to change that. But I haven't heard any yet, to be honest. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 12:34:30, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: > > On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote: > >> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: > > [...] > >> > More interesting stuff is higher in the kernel log > >> > : [ 366.435015] > >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 > >> > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 > >> > > >> > Are you sure you want to have hard limit set to 0? > >> > >> syzkaller really does not mind to have it. > > > > So what do you use it for? What do you actually test by this setting? > > syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is > doable from user-space. Some of that may not make sense, but it does > not matter because kernel should still stand still. I am not questioning that. What I am saying is that the configuration doesn't make much sense and the kernel warns about it. > > [...] > >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > >> > index 4603ad75c9a9..852cd3dbdcd9 100644 > >> > --- a/mm/memcontrol.c > >> > +++ b/mm/memcontrol.c > >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct > >> > mem_cgroup *memcg, gfp_t gfp_mask, > >> > bool ret; > >> > > >> > mutex_lock(_lock); > >> > + pr_info("task=%s pid=%d invoked memcg oom killer. > >> > oom_victim=%d\n", > >> > + current->comm, current->pid, > >> > tsk_is_oom_victim(current)); > >> > ret = out_of_memory(); > >> > mutex_unlock(_lock); > >> > return ret; > >> > > >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit > >> > and basically no memory charged by existing tasks is not going to fly > >> > and the warning is exactly to call that out. > >> > >> > >> Please-please-please do not mix kernel bugs and notices to user into > >> the same bucket: > > > > Well, WARN_ON used to be a standard way to make user aware of a > > misbehavior. In this case it warns about a pottential runaway when memcg > > is misconfigured. I do not insist on using WARN_ON here of course. If > > there is a general agreement that such a condition is better handled by > > pr_err then I am fine with it. Users tend to be more sensitive on > > WARN_ONs though. > > The docs change was acked by Greg, and Andrew took it into mm, Linus > was CCed too. It missed the release because I guess it's comments only > change, but otherwise it should reach upstream tree on the next merge > window. > > WARN is _not_ a common way to notify users today. syzbot reports _all_ > WARN occurrences and you can see there are not many of them now > (probably 1 another now, +dtor for that one): > https://syzkaller.appspot.com#upstream > There is probably some long tail that we need to fix. We really do > want systematic testing capability. You do not want every of 2 billion > linux users to come to you with this kernel splat, just so that you > can explain to them that it's some programs of their machines doing > something wrong, right? [This is an orthogonal discussion I believe] How does it differ from pr_err though? > WARN is really a bad way to inform a user about something. Consider a > non-kernel developer, perhaps even non-programmer. What they see is > "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 > try_charge+0x734/0x1680" followed by some obscure things and hex > numbers. File:line reference is pointless, they don't what what/where > it is. Well, you get a precise location where the problem happens and the backtrace to see how it happened. This is much more information than, pr_err without dump_stack. > This one is slightly better because it prints "Memory cgroup > charge failed because of no reclaimable memory! This looks like a > misconfiguration or a kernel bug." before the warning. But still it > says "or a kernel bug", which means that they will come to you. Yeah, and that was the purpose of the thing. > A much > friendlier for user way to say this would be print a message at the > point of misconfiguration saying what exactly is wrong, e.g. "pid $PID > misconfigures cgroup /cgroup/path with mem.limit=0" without a stack > trace (does not give any useful info for user). And return EINVAL if > it can't fly at all? And then leave the "or a kernel bug" part for the > WARNING each occurrence of which we do want to be reported to kernel > developers. But this is not applicable here. Your misconfiguration is quite obvious because you simply set the hard limit to 0. This is not the only situation when this can happen. There is no clear point to tell, you are doing this wrong. If it was we would do it at that point obviously. If you have a strong reason to believe that this is an abuse of WARN I am all happy to change that. But I haven't heard any yet, to be honest. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch and the reproducer did not trigger crash: Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com Tested on: commit: 1ffaddd029c8 Linux 4.18-rc8 git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=3bdb367561cb7285 compiler: gcc (GCC) 8.0.1 20180413 (experimental) patch: https://syzkaller.appspot.com/x/patch.diff?x=146f983040 Note: testing is done by a robot and is best-effort only.
Re: WARNING in try_charge
Hello, syzbot has tested the proposed patch and the reproducer did not trigger crash: Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com Tested on: commit: 1ffaddd029c8 Linux 4.18-rc8 git tree: upstream kernel config: https://syzkaller.appspot.com/x/.config?x=3bdb367561cb7285 compiler: gcc (GCC) 8.0.1 20180413 (experimental) patch: https://syzkaller.appspot.com/x/patch.diff?x=146f983040 Note: testing is done by a robot and is best-effort only.
Re: WARNING in try_charge
On 2018/08/06 19:39, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: >> Btw. running with the above diff on top might help us to ideantify >> whether this is a pre-mature warning or a valid one. Still useful to >> find out. Since syzbot already found a syz reproducer, you can ask syzbot to test it. > > The bug report has a reproducer, so you can run it with the patch. Or > ask syzbot to test your patch: > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches > Which basically boils down to saying: > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > master Excuse me, but this is linux-next only problem. Therefore, #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..852cd3dbdcd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(); mutex_unlock(_lock); return ret; F.Y.I. Waiting until __mmput() completes (with timeout using OOM score feedback) ( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solves this race.
Re: WARNING in try_charge
On 2018/08/06 19:39, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: >> Btw. running with the above diff on top might help us to ideantify >> whether this is a pre-mature warning or a valid one. Still useful to >> find out. Since syzbot already found a syz reproducer, you can ask syzbot to test it. > > The bug report has a reproducer, so you can run it with the patch. Or > ask syzbot to test your patch: > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches > Which basically boils down to saying: > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > master Excuse me, but this is linux-next only problem. Therefore, #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..852cd3dbdcd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(); mutex_unlock(_lock); return ret; F.Y.I. Waiting until __mmput() completes (with timeout using OOM score feedback) ( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solves this race.
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: > On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote: >> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: > [...] >> > More interesting stuff is higher in the kernel log >> > : [ 366.435015] >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 >> > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 >> > >> > Are you sure you want to have hard limit set to 0? >> >> syzkaller really does not mind to have it. > > So what do you use it for? What do you actually test by this setting? > > [...] >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> > index 4603ad75c9a9..852cd3dbdcd9 100644 >> > --- a/mm/memcontrol.c >> > +++ b/mm/memcontrol.c >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct >> > mem_cgroup *memcg, gfp_t gfp_mask, >> > bool ret; >> > >> > mutex_lock(_lock); >> > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", >> > + current->comm, current->pid, >> > tsk_is_oom_victim(current)); >> > ret = out_of_memory(); >> > mutex_unlock(_lock); >> > return ret; >> > >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit >> > and basically no memory charged by existing tasks is not going to fly >> > and the warning is exactly to call that out. >> >> >> Please-please-please do not mix kernel bugs and notices to user into >> the same bucket: > > Well, WARN_ON used to be a standard way to make user aware of a > misbehavior. In this case it warns about a pottential runaway when memcg > is misconfigured. I do not insist on using WARN_ON here of course. If > there is a general agreement that such a condition is better handled by > pr_err then I am fine with it. Users tend to be more sensitive on > WARN_ONs though. > > Btw. running with the above diff on top might help us to ideantify > whether this is a pre-mature warning or a valid one. Still useful to > find out. The bug report has a reproducer, so you can run it with the patch. Or ask syzbot to test your patch: https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches Which basically boils down to saying: #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master Note that a text patch without a base tree/commit can be useless, I used torvalds/linux.git but I don't know if it will apply there or not. Let's see. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..852cd3dbdcd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(); mutex_unlock(_lock); return ret;
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: > On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote: >> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: > [...] >> > More interesting stuff is higher in the kernel log >> > : [ 366.435015] >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 >> > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 >> > >> > Are you sure you want to have hard limit set to 0? >> >> syzkaller really does not mind to have it. > > So what do you use it for? What do you actually test by this setting? > > [...] >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> > index 4603ad75c9a9..852cd3dbdcd9 100644 >> > --- a/mm/memcontrol.c >> > +++ b/mm/memcontrol.c >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct >> > mem_cgroup *memcg, gfp_t gfp_mask, >> > bool ret; >> > >> > mutex_lock(_lock); >> > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", >> > + current->comm, current->pid, >> > tsk_is_oom_victim(current)); >> > ret = out_of_memory(); >> > mutex_unlock(_lock); >> > return ret; >> > >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit >> > and basically no memory charged by existing tasks is not going to fly >> > and the warning is exactly to call that out. >> >> >> Please-please-please do not mix kernel bugs and notices to user into >> the same bucket: > > Well, WARN_ON used to be a standard way to make user aware of a > misbehavior. In this case it warns about a pottential runaway when memcg > is misconfigured. I do not insist on using WARN_ON here of course. If > there is a general agreement that such a condition is better handled by > pr_err then I am fine with it. Users tend to be more sensitive on > WARN_ONs though. > > Btw. running with the above diff on top might help us to ideantify > whether this is a pre-mature warning or a valid one. Still useful to > find out. The bug report has a reproducer, so you can run it with the patch. Or ask syzbot to test your patch: https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches Which basically boils down to saying: #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master Note that a text patch without a base tree/commit can be useless, I used torvalds/linux.git but I don't know if it will apply there or not. Let's see. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4603ad75c9a9..852cd3dbdcd9 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask, bool ret; mutex_lock(_lock); + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", + current->comm, current->pid, tsk_is_oom_victim(current)); ret = out_of_memory(); mutex_unlock(_lock); return ret;
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: > On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote: >> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: > [...] >> > More interesting stuff is higher in the kernel log >> > : [ 366.435015] >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 >> > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 >> > >> > Are you sure you want to have hard limit set to 0? >> >> syzkaller really does not mind to have it. > > So what do you use it for? What do you actually test by this setting? syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is doable from user-space. Some of that may not make sense, but it does not matter because kernel should still stand still. > [...] >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> > index 4603ad75c9a9..852cd3dbdcd9 100644 >> > --- a/mm/memcontrol.c >> > +++ b/mm/memcontrol.c >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct >> > mem_cgroup *memcg, gfp_t gfp_mask, >> > bool ret; >> > >> > mutex_lock(_lock); >> > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", >> > + current->comm, current->pid, >> > tsk_is_oom_victim(current)); >> > ret = out_of_memory(); >> > mutex_unlock(_lock); >> > return ret; >> > >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit >> > and basically no memory charged by existing tasks is not going to fly >> > and the warning is exactly to call that out. >> >> >> Please-please-please do not mix kernel bugs and notices to user into >> the same bucket: > > Well, WARN_ON used to be a standard way to make user aware of a > misbehavior. In this case it warns about a pottential runaway when memcg > is misconfigured. I do not insist on using WARN_ON here of course. If > there is a general agreement that such a condition is better handled by > pr_err then I am fine with it. Users tend to be more sensitive on > WARN_ONs though. The docs change was acked by Greg, and Andrew took it into mm, Linus was CCed too. It missed the release because I guess it's comments only change, but otherwise it should reach upstream tree on the next merge window. WARN is _not_ a common way to notify users today. syzbot reports _all_ WARN occurrences and you can see there are not many of them now (probably 1 another now, +dtor for that one): https://syzkaller.appspot.com#upstream There is probably some long tail that we need to fix. We really do want systematic testing capability. You do not want every of 2 billion linux users to come to you with this kernel splat, just so that you can explain to them that it's some programs of their machines doing something wrong, right? WARN is really a bad way to inform a user about something. Consider a non-kernel developer, perhaps even non-programmer. What they see is "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680" followed by some obscure things and hex numbers. File:line reference is pointless, they don't what what/where it is. This one is slightly better because it prints "Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug." before the warning. But still it says "or a kernel bug", which means that they will come to you. A much friendlier for user way to say this would be print a message at the point of misconfiguration saying what exactly is wrong, e.g. "pid $PID misconfigures cgroup /cgroup/path with mem.limit=0" without a stack trace (does not give any useful info for user). And return EINVAL if it can't fly at all? And then leave the "or a kernel bug" part for the WARNING each occurrence of which we do want to be reported to kernel developers.
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko wrote: > On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote: >> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: > [...] >> > More interesting stuff is higher in the kernel log >> > : [ 366.435015] >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 >> > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 >> > >> > Are you sure you want to have hard limit set to 0? >> >> syzkaller really does not mind to have it. > > So what do you use it for? What do you actually test by this setting? syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is doable from user-space. Some of that may not make sense, but it does not matter because kernel should still stand still. > [...] >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> > index 4603ad75c9a9..852cd3dbdcd9 100644 >> > --- a/mm/memcontrol.c >> > +++ b/mm/memcontrol.c >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct >> > mem_cgroup *memcg, gfp_t gfp_mask, >> > bool ret; >> > >> > mutex_lock(_lock); >> > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", >> > + current->comm, current->pid, >> > tsk_is_oom_victim(current)); >> > ret = out_of_memory(); >> > mutex_unlock(_lock); >> > return ret; >> > >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit >> > and basically no memory charged by existing tasks is not going to fly >> > and the warning is exactly to call that out. >> >> >> Please-please-please do not mix kernel bugs and notices to user into >> the same bucket: > > Well, WARN_ON used to be a standard way to make user aware of a > misbehavior. In this case it warns about a pottential runaway when memcg > is misconfigured. I do not insist on using WARN_ON here of course. If > there is a general agreement that such a condition is better handled by > pr_err then I am fine with it. Users tend to be more sensitive on > WARN_ONs though. The docs change was acked by Greg, and Andrew took it into mm, Linus was CCed too. It missed the release because I guess it's comments only change, but otherwise it should reach upstream tree on the next merge window. WARN is _not_ a common way to notify users today. syzbot reports _all_ WARN occurrences and you can see there are not many of them now (probably 1 another now, +dtor for that one): https://syzkaller.appspot.com#upstream There is probably some long tail that we need to fix. We really do want systematic testing capability. You do not want every of 2 billion linux users to come to you with this kernel splat, just so that you can explain to them that it's some programs of their machines doing something wrong, right? WARN is really a bad way to inform a user about something. Consider a non-kernel developer, perhaps even non-programmer. What they see is "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680" followed by some obscure things and hex numbers. File:line reference is pointless, they don't what what/where it is. This one is slightly better because it prints "Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug." before the warning. But still it says "or a kernel bug", which means that they will come to you. A much friendlier for user way to say this would be print a message at the point of misconfiguration saying what exactly is wrong, e.g. "pid $PID misconfigures cgroup /cgroup/path with mem.limit=0" without a stack trace (does not give any useful info for user). And return EINVAL if it can't fly at all? And then leave the "or a kernel bug" part for the WARNING each occurrence of which we do want to be reported to kernel developers.
Re: WARNING in try_charge
On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: [...] > > More interesting stuff is higher in the kernel log > > : [ 366.435015] > > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 > > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 > > > > Are you sure you want to have hard limit set to 0? > > syzkaller really does not mind to have it. So what do you use it for? What do you actually test by this setting? [...] > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 4603ad75c9a9..852cd3dbdcd9 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct > > mem_cgroup *memcg, gfp_t gfp_mask, > > bool ret; > > > > mutex_lock(_lock); > > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", > > + current->comm, current->pid, > > tsk_is_oom_victim(current)); > > ret = out_of_memory(); > > mutex_unlock(_lock); > > return ret; > > > > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit > > and basically no memory charged by existing tasks is not going to fly > > and the warning is exactly to call that out. > > > Please-please-please do not mix kernel bugs and notices to user into > the same bucket: Well, WARN_ON used to be a standard way to make user aware of a misbehavior. In this case it warns about a pottential runaway when memcg is misconfigured. I do not insist on using WARN_ON here of course. If there is a general agreement that such a condition is better handled by pr_err then I am fine with it. Users tend to be more sensitive on WARN_ONs though. Btw. running with the above diff on top might help us to ideantify whether this is a pre-mature warning or a valid one. Still useful to find out. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote: > On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: [...] > > More interesting stuff is higher in the kernel log > > : [ 366.435015] > > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 > > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 > > > > Are you sure you want to have hard limit set to 0? > > syzkaller really does not mind to have it. So what do you use it for? What do you actually test by this setting? [...] > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 4603ad75c9a9..852cd3dbdcd9 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct > > mem_cgroup *memcg, gfp_t gfp_mask, > > bool ret; > > > > mutex_lock(_lock); > > + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n", > > + current->comm, current->pid, > > tsk_is_oom_victim(current)); > > ret = out_of_memory(); > > mutex_unlock(_lock); > > return ret; > > > > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit > > and basically no memory charged by existing tasks is not going to fly > > and the warning is exactly to call that out. > > > Please-please-please do not mix kernel bugs and notices to user into > the same bucket: Well, WARN_ON used to be a standard way to make user aware of a misbehavior. In this case it warns about a pottential runaway when memcg is misconfigured. I do not insist on using WARN_ON here of course. If there is a general agreement that such a condition is better handled by pr_err then I am fine with it. Users tend to be more sensitive on WARN_ONs though. Btw. running with the above diff on top might help us to ideantify whether this is a pre-mature warning or a valid one. Still useful to find out. -- Michal Hocko SUSE Labs
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: > On Sat 04-08-18 06:33:02, syzbot wrote: >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit:d1e0b8e0cb7a Add linux-next specific files for 20180725 >> git tree: linux-next >> console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c77040 >> kernel config: https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33 >> dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325 >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> >> Unfortunately, I don't have any reproducer for this crash yet. >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com >> >> Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, >> file-rss:32768kB, shmem-rss:0kB >> oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, >> file-rss:32000kB, shmem-rss:0kB > > More interesting stuff is higher in the kernel log > : [ 366.435015] > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 > > Are you sure you want to have hard limit set to 0? syzkaller really does not mind to have it. > : [ 366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > > There are only 3 pages charged to this memcg! > > : [ 366.487490] Tasks state (memory values in pages): > : [ 366.492349] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 366.501237] [ 23766] 0 2376617620 8221 1269760 > 0 syz-executor3 > : [ 366.510367] [ 23767] 0 2376717618 8218 1269760 > 0 syz-executor2 > : [ 366.519409] Memory cgroup out of memory: Kill process 23766 > (syz-executor3) score 8252000 or sacrifice child > : [ 366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, > anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB > : [ 366.540456] oom_reaper: reaped process 23766 (syz-executor3), now > anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB > > The oom reaper cannot reclaim file backed memory from a large part. I > assume this is are shared mappings which are living outside of memcg > because of the counter. > > : [...] > : [ 367.085870] > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0 > : [ 367.100073] memory: usage 112kB, limit 0kB, failcnt 1615 > : [ 367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > : [ 367.138136] Tasks state (memory values in pages): > : [ 367.142986] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 367.151889] [ 23766] 0 2376617620 8002 1269760 > 0 syz-executor3 > : [ 367.160946] [ 23767] 0 2376717618 8218 1269760 > 0 syz-executor2 > : [ 367.169994] Memory cgroup out of memory: Kill process 23767 > (syz-executor2) score 8249000 or sacrifice child > : [ 367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, > anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB > : [ 367.192101] oom_reaper: reaped process 23767 (syz-executor2), now > anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB > : [ 367.202986] [ cut here ] > : [ 367.207845] Memory cgroup charge failed because of no reclaimable > memory! This looks like a misconfiguration or a kernel bug. > : [ 367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 > try_charge+0x734/0x1680 > : [ 367.227540] Kernel panic - not syncing: panic_on_warn set ... > > This is unexpected though. We have killed a task (23767) which is trying > to charge the memory which means it should > trigger the charge retry and that one should force the charge > > /* > * Unlike in global OOM situations, memcg is not in a physical > * memory shortage. Allow dying and OOM-killed tasks to > * bypass the last charges so that they can exit quickly and > * free their memory. > */ > if (unlikely(tsk_is_oom_victim(current) || >
Re: WARNING in try_charge
On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko wrote: > On Sat 04-08-18 06:33:02, syzbot wrote: >> Hello, >> >> syzbot found the following crash on: >> >> HEAD commit:d1e0b8e0cb7a Add linux-next specific files for 20180725 >> git tree: linux-next >> console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c77040 >> kernel config: https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33 >> dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325 >> compiler: gcc (GCC) 8.0.1 20180413 (experimental) >> >> Unfortunately, I don't have any reproducer for this crash yet. >> >> IMPORTANT: if you fix the bug, please add the following tag to the commit: >> Reported-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com >> >> Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, >> file-rss:32768kB, shmem-rss:0kB >> oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, >> file-rss:32000kB, shmem-rss:0kB > > More interesting stuff is higher in the kernel log > : [ 366.435015] > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 > : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 > > Are you sure you want to have hard limit set to 0? syzkaller really does not mind to have it. > : [ 366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > > There are only 3 pages charged to this memcg! > > : [ 366.487490] Tasks state (memory values in pages): > : [ 366.492349] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 366.501237] [ 23766] 0 2376617620 8221 1269760 > 0 syz-executor3 > : [ 366.510367] [ 23767] 0 2376717618 8218 1269760 > 0 syz-executor2 > : [ 366.519409] Memory cgroup out of memory: Kill process 23766 > (syz-executor3) score 8252000 or sacrifice child > : [ 366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, > anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB > : [ 366.540456] oom_reaper: reaped process 23766 (syz-executor3), now > anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB > > The oom reaper cannot reclaim file backed memory from a large part. I > assume this is are shared mappings which are living outside of memcg > because of the counter. > > : [...] > : [ 367.085870] > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0 > : [ 367.100073] memory: usage 112kB, limit 0kB, failcnt 1615 > : [ 367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 > : [ 367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB > rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB > inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB > unevictable:0KB > : [ 367.138136] Tasks state (memory values in pages): > : [ 367.142986] [ pid ] uid tgid total_vm rss pgtables_bytes > swapents oom_score_adj name > : [ 367.151889] [ 23766] 0 2376617620 8002 1269760 > 0 syz-executor3 > : [ 367.160946] [ 23767] 0 2376717618 8218 1269760 > 0 syz-executor2 > : [ 367.169994] Memory cgroup out of memory: Kill process 23767 > (syz-executor2) score 8249000 or sacrifice child > : [ 367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, > anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB > : [ 367.192101] oom_reaper: reaped process 23767 (syz-executor2), now > anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB > : [ 367.202986] [ cut here ] > : [ 367.207845] Memory cgroup charge failed because of no reclaimable > memory! This looks like a misconfiguration or a kernel bug. > : [ 367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 > try_charge+0x734/0x1680 > : [ 367.227540] Kernel panic - not syncing: panic_on_warn set ... > > This is unexpected though. We have killed a task (23767) which is trying > to charge the memory which means it should > trigger the charge retry and that one should force the charge > > /* > * Unlike in global OOM situations, memcg is not in a physical > * memory shortage. Allow dying and OOM-killed tasks to > * bypass the last charges so that they can exit quickly and > * free their memory. > */ > if (unlikely(tsk_is_oom_victim(current) || >
Re: WARNING in try_charge
On Sat 04-08-18 06:33:02, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit:d1e0b8e0cb7a Add linux-next specific files for 20180725 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c77040 > kernel config: https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33 > dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com > > Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, > file-rss:32768kB, shmem-rss:0kB > oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, > file-rss:32000kB, shmem-rss:0kB More interesting stuff is higher in the kernel log : [ 366.435015] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 Are you sure you want to have hard limit set to 0? : [ 366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB There are only 3 pages charged to this memcg! : [ 366.487490] Tasks state (memory values in pages): : [ 366.492349] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 366.501237] [ 23766] 0 2376617620 8221 1269760 0 syz-executor3 : [ 366.510367] [ 23767] 0 2376717618 8218 1269760 0 syz-executor2 : [ 366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) score 8252000 or sacrifice child : [ 366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB : [ 366.540456] oom_reaper: reaped process 23766 (syz-executor3), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB The oom reaper cannot reclaim file backed memory from a large part. I assume this is are shared mappings which are living outside of memcg because of the counter. : [...] : [ 367.085870] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0 : [ 367.100073] memory: usage 112kB, limit 0kB, failcnt 1615 : [ 367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : [ 367.138136] Tasks state (memory values in pages): : [ 367.142986] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 367.151889] [ 23766] 0 2376617620 8002 1269760 0 syz-executor3 : [ 367.160946] [ 23767] 0 2376717618 8218 1269760 0 syz-executor2 : [ 367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) score 8249000 or sacrifice child : [ 367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB : [ 367.192101] oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB : [ 367.202986] [ cut here ] : [ 367.207845] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. : [ 367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680 : [ 367.227540] Kernel panic - not syncing: panic_on_warn set ... This is unexpected though. We have killed a task (23767) which is trying to charge the memory which means it should trigger the charge retry and that one should force the charge /* * Unlike in global OOM situations, memcg is not in a physical * memory shortage. Allow dying and OOM-killed tasks to * bypass the last charges so that they can exit quickly and * free their memory. */ if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; There doesn't seem to be any other sign of OOM killer invocation which could then indeed lead to the warning as there is no other task to kill (both
Re: WARNING in try_charge
On Sat 04-08-18 06:33:02, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit:d1e0b8e0cb7a Add linux-next specific files for 20180725 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c77040 > kernel config: https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33 > dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325 > compiler: gcc (GCC) 8.0.1 20180413 (experimental) > > Unfortunately, I don't have any reproducer for this crash yet. > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com > > Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, > file-rss:32768kB, shmem-rss:0kB > oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, > file-rss:32000kB, shmem-rss:0kB More interesting stuff is higher in the kernel log : [ 366.435015] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0 : [ 366.449416] memory: usage 112kB, limit 0kB, failcnt 1605 Are you sure you want to have hard limit set to 0? : [ 366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB There are only 3 pages charged to this memcg! : [ 366.487490] Tasks state (memory values in pages): : [ 366.492349] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 366.501237] [ 23766] 0 2376617620 8221 1269760 0 syz-executor3 : [ 366.510367] [ 23767] 0 2376717618 8218 1269760 0 syz-executor2 : [ 366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) score 8252000 or sacrifice child : [ 366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB : [ 366.540456] oom_reaper: reaped process 23766 (syz-executor3), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB The oom reaper cannot reclaim file backed memory from a large part. I assume this is are shared mappings which are living outside of memcg because of the counter. : [...] : [ 367.085870] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0 : [ 367.100073] memory: usage 112kB, limit 0kB, failcnt 1615 : [ 367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 : [ 367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB : [ 367.138136] Tasks state (memory values in pages): : [ 367.142986] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name : [ 367.151889] [ 23766] 0 2376617620 8002 1269760 0 syz-executor3 : [ 367.160946] [ 23767] 0 2376717618 8218 1269760 0 syz-executor2 : [ 367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) score 8249000 or sacrifice child : [ 367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB : [ 367.192101] oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB : [ 367.202986] [ cut here ] : [ 367.207845] Memory cgroup charge failed because of no reclaimable memory! This looks like a misconfiguration or a kernel bug. : [ 367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 try_charge+0x734/0x1680 : [ 367.227540] Kernel panic - not syncing: panic_on_warn set ... This is unexpected though. We have killed a task (23767) which is trying to charge the memory which means it should trigger the charge retry and that one should force the charge /* * Unlike in global OOM situations, memcg is not in a physical * memory shortage. Allow dying and OOM-killed tasks to * bypass the last charges so that they can exit quickly and * free their memory. */ if (unlikely(tsk_is_oom_victim(current) || fatal_signal_pending(current) || current->flags & PF_EXITING)) goto force; There doesn't seem to be any other sign of OOM killer invocation which could then indeed lead to the warning as there is no other task to kill (both
Re: WARNING in try_charge
On 2018/08/04 22:45, Tetsuo Handa wrote: > syzbot is hitting WARN(1) because of mem_cgroup_out_of_memory() == false. Since syzbot found a syz reproducer, I asked syzbot to try two patches. Setting MMF_OOM_SKIP under oom_lock to prevent from races ( https://syzkaller.appspot.com/x/patch.diff?x=10fb3fd040 ) was not sufficient. Waiting until __mmput() completes (with timeout using OOM score feedback) ( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solved this race.
Re: WARNING in try_charge
On 2018/08/04 22:45, Tetsuo Handa wrote: > syzbot is hitting WARN(1) because of mem_cgroup_out_of_memory() == false. Since syzbot found a syz reproducer, I asked syzbot to try two patches. Setting MMF_OOM_SKIP under oom_lock to prevent from races ( https://syzkaller.appspot.com/x/patch.diff?x=10fb3fd040 ) was not sufficient. Waiting until __mmput() completes (with timeout using OOM score feedback) ( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solved this race.