Re: WARNING in try_charge

2018-08-09 Thread Tetsuo Handa
On 2018/08/10 0:07, Michal Hocko wrote:
> On Thu 09-08-18 22:57:43, Tetsuo Handa wrote:
>> >From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa 
>> Date: Thu, 9 Aug 2018 22:49:40 +0900
>> Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until
>>  memory reserve fails
>>
>> Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
>> oom_reaped tasks") changed to select next OOM victim as soon as
>> MMF_OOM_SKIP is set. But we don't need to select next OOM victim as
>> long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1)
>> caused by this race window [1].
> 
> It is not because the syzbot was exercising a completely different code
> path (memcg charge rather than the page allocator).

I know syzbot is hitting memcg charge path.

> 
>> Since memcg OOM case uses forced charge if current thread is killed,
>> out_of_memory() can return true without selecting next OOM victim.
>> Therefore, this patch changes task_will_free_mem(current) to ignore
>> MMF_OOM_SKIP unless ALLOC_OOM allocation failed.
> 
> And the patch is simply wrong for memcg.
> 

Why? I think I should have done

-+  page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM
-+   || (gfp_mask & __GFP_NOMEMALLOC), ac,
-+   _some_progress);
++  page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM,
++   ac, _some_progress);

because nobody will use __GFP_NOMEMALLOC | __GFP_NOFAIL. But for memcg charge
path, task_will_free_mem(current, false) == true and out_of_memory() will return
true, which avoids unnecessary OOM killing.

Of course, this patch cannot avoid unnecessary OOM killing if out_of_memory()
is called by not yet killed process. But to mitigate it, what can we do other
than defer setting MMF_OOM_SKIP using a timeout based mechanism? Making
the OOM reaper unconditionally reclaim all memory is not a valid answer.



Re: WARNING in try_charge

2018-08-09 Thread Tetsuo Handa
On 2018/08/10 0:07, Michal Hocko wrote:
> On Thu 09-08-18 22:57:43, Tetsuo Handa wrote:
>> >From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001
>> From: Tetsuo Handa 
>> Date: Thu, 9 Aug 2018 22:49:40 +0900
>> Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until
>>  memory reserve fails
>>
>> Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
>> oom_reaped tasks") changed to select next OOM victim as soon as
>> MMF_OOM_SKIP is set. But we don't need to select next OOM victim as
>> long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1)
>> caused by this race window [1].
> 
> It is not because the syzbot was exercising a completely different code
> path (memcg charge rather than the page allocator).

I know syzbot is hitting memcg charge path.

> 
>> Since memcg OOM case uses forced charge if current thread is killed,
>> out_of_memory() can return true without selecting next OOM victim.
>> Therefore, this patch changes task_will_free_mem(current) to ignore
>> MMF_OOM_SKIP unless ALLOC_OOM allocation failed.
> 
> And the patch is simply wrong for memcg.
> 

Why? I think I should have done

-+  page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM
-+   || (gfp_mask & __GFP_NOMEMALLOC), ac,
-+   _some_progress);
++  page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM,
++   ac, _some_progress);

because nobody will use __GFP_NOMEMALLOC | __GFP_NOFAIL. But for memcg charge
path, task_will_free_mem(current, false) == true and out_of_memory() will return
true, which avoids unnecessary OOM killing.

Of course, this patch cannot avoid unnecessary OOM killing if out_of_memory()
is called by not yet killed process. But to mitigate it, what can we do other
than defer setting MMF_OOM_SKIP using a timeout based mechanism? Making
the OOM reaper unconditionally reclaim all memory is not a valid answer.



Re: WARNING in try_charge

2018-08-09 Thread Johannes Weiner
On Thu, Aug 09, 2018 at 10:57:43PM +0900, Tetsuo Handa wrote:
> From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa 
> Date: Thu, 9 Aug 2018 22:49:40 +0900
> Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until
>  memory reserve fails
> 
> Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
> oom_reaped tasks") changed to select next OOM victim as soon as
> MMF_OOM_SKIP is set. But we don't need to select next OOM victim as
> long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1)
> caused by this race window [1].

Huh? That's the memcg path, it has nothing to do with ALLOC_OOM.

> Since memcg OOM case uses forced charge if current thread is killed,
> out_of_memory() can return true without selecting next OOM victim.
> Therefore, this patch changes task_will_free_mem(current) to ignore
> MMF_OOM_SKIP unless ALLOC_OOM allocation failed.

I have no idea how the first and the second half of this paragraph go
together. They're completely independent code paths.


Re: WARNING in try_charge

2018-08-09 Thread Johannes Weiner
On Thu, Aug 09, 2018 at 10:57:43PM +0900, Tetsuo Handa wrote:
> From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa 
> Date: Thu, 9 Aug 2018 22:49:40 +0900
> Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until
>  memory reserve fails
> 
> Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
> oom_reaped tasks") changed to select next OOM victim as soon as
> MMF_OOM_SKIP is set. But we don't need to select next OOM victim as
> long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1)
> caused by this race window [1].

Huh? That's the memcg path, it has nothing to do with ALLOC_OOM.

> Since memcg OOM case uses forced charge if current thread is killed,
> out_of_memory() can return true without selecting next OOM victim.
> Therefore, this patch changes task_will_free_mem(current) to ignore
> MMF_OOM_SKIP unless ALLOC_OOM allocation failed.

I have no idea how the first and the second half of this paragraph go
together. They're completely independent code paths.


Re: WARNING in try_charge

2018-08-09 Thread Michal Hocko
On Thu 09-08-18 22:57:43, Tetsuo Handa wrote:
> >From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa 
> Date: Thu, 9 Aug 2018 22:49:40 +0900
> Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until
>  memory reserve fails
> 
> Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
> oom_reaped tasks") changed to select next OOM victim as soon as
> MMF_OOM_SKIP is set. But we don't need to select next OOM victim as
> long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1)
> caused by this race window [1].

It is not because the syzbot was exercising a completely different code
path (memcg charge rather than the page allocator).

> Since memcg OOM case uses forced charge if current thread is killed,
> out_of_memory() can return true without selecting next OOM victim.
> Therefore, this patch changes task_will_free_mem(current) to ignore
> MMF_OOM_SKIP unless ALLOC_OOM allocation failed.

And the patch is simply wrong for memcg.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-09 Thread Michal Hocko
On Thu 09-08-18 22:57:43, Tetsuo Handa wrote:
> >From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa 
> Date: Thu, 9 Aug 2018 22:49:40 +0900
> Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until
>  memory reserve fails
> 
> Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
> oom_reaped tasks") changed to select next OOM victim as soon as
> MMF_OOM_SKIP is set. But we don't need to select next OOM victim as
> long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1)
> caused by this race window [1].

It is not because the syzbot was exercising a completely different code
path (memcg charge rather than the page allocator).

> Since memcg OOM case uses forced charge if current thread is killed,
> out_of_memory() can return true without selecting next OOM victim.
> Therefore, this patch changes task_will_free_mem(current) to ignore
> MMF_OOM_SKIP unless ALLOC_OOM allocation failed.

And the patch is simply wrong for memcg.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-09 Thread Tetsuo Handa
>From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001
From: Tetsuo Handa 
Date: Thu, 9 Aug 2018 22:49:40 +0900
Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until
 memory reserve fails

Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
oom_reaped tasks") changed to select next OOM victim as soon as
MMF_OOM_SKIP is set. But we don't need to select next OOM victim as
long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1)
caused by this race window [1].

Since memcg OOM case uses forced charge if current thread is killed,
out_of_memory() can return true without selecting next OOM victim.
Therefore, this patch changes task_will_free_mem(current) to ignore
MMF_OOM_SKIP unless ALLOC_OOM allocation failed.

[1] 
https://syzkaller.appspot.com/bug?id=ea8c7912757d253537375e981b61749b2da69258

Signed-off-by: Tetsuo Handa 
Reported-by: syzbot 
Cc: Michal Hocko 
Cc: Oleg Nesterov 
Cc: Vladimir Davydov 
Cc: David Rientjes 
---
 include/linux/oom.h | 3 +++
 mm/oom_kill.c   | 8 
 mm/page_alloc.c | 7 +--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 69864a5..b5abacd 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -38,6 +38,9 @@ struct oom_control {
 */
const int order;
 
+   /* Did we already try ALLOC_OOM allocation? i*/
+   const bool reserve_tried;
+
/* Used by oom implementation, do not set */
unsigned long totalpages;
struct task_struct *chosen;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0e10b86..95453e8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -782,7 +782,7 @@ static inline bool __task_will_free_mem(struct task_struct 
*task)
  * Caller has to make sure that task->mm is stable (hold task_lock or
  * it operates on the current).
  */
-static bool task_will_free_mem(struct task_struct *task)
+static bool task_will_free_mem(struct task_struct *task, bool select_new)
 {
struct mm_struct *mm = task->mm;
struct task_struct *p;
@@ -803,7 +803,7 @@ static bool task_will_free_mem(struct task_struct *task)
 * This task has already been drained by the oom reaper so there are
 * only small chances it will free some more
 */
-   if (test_bit(MMF_OOM_SKIP, >flags))
+   if (test_bit(MMF_OOM_SKIP, >flags) && select_new)
return false;
 
if (atomic_read(>mm_users) <= 1)
@@ -939,7 +939,7 @@ static void oom_kill_process(struct oom_control *oc, const 
char *message)
 * so it can die quickly
 */
task_lock(p);
-   if (task_will_free_mem(p)) {
+   if (task_will_free_mem(p, true)) {
mark_oom_victim(p);
wake_oom_reaper(p);
task_unlock(p);
@@ -1069,7 +1069,7 @@ bool out_of_memory(struct oom_control *oc)
 * select it.  The goal is to allow it to allocate so that it may
 * quickly exit and free its memory.
 */
-   if (task_will_free_mem(current)) {
+   if (task_will_free_mem(current, oc->reserve_tried)) {
mark_oom_victim(current);
wake_oom_reaper(current);
return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 879b861..03ca29a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3455,7 +3455,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, 
const char *fmt, ...)
 }
 
 static inline struct page *
-__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
+__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, bool reserve_tried,
const struct alloc_context *ac, unsigned long *did_some_progress)
 {
struct oom_control oc = {
@@ -3464,6 +3464,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, 
const char *fmt, ...)
.memcg = NULL,
.gfp_mask = gfp_mask,
.order = order,
+   .reserve_tried = reserve_tried,
};
struct page *page;
 
@@ -4239,7 +4240,9 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
goto retry_cpuset;
 
/* Reclaim has failed us, start killing things */
-   page = __alloc_pages_may_oom(gfp_mask, order, ac, _some_progress);
+   page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM
+|| (gfp_mask & __GFP_NOMEMALLOC), ac,
+_some_progress);
if (page)
goto got_pg;
 
-- 
1.8.3.1




Re: WARNING in try_charge

2018-08-09 Thread Tetsuo Handa
>From b1f38168f14397c7af9c122cd8207663d96e02ec Mon Sep 17 00:00:00 2001
From: Tetsuo Handa 
Date: Thu, 9 Aug 2018 22:49:40 +0900
Subject: [PATCH] mm, oom: task_will_free_mem(current) should retry until
 memory reserve fails

Commit 696453e66630ad45 ("mm, oom: task_will_free_mem should skip
oom_reaped tasks") changed to select next OOM victim as soon as
MMF_OOM_SKIP is set. But we don't need to select next OOM victim as
long as ALLOC_OOM allocation can succeed. And syzbot is hitting WARN(1)
caused by this race window [1].

Since memcg OOM case uses forced charge if current thread is killed,
out_of_memory() can return true without selecting next OOM victim.
Therefore, this patch changes task_will_free_mem(current) to ignore
MMF_OOM_SKIP unless ALLOC_OOM allocation failed.

[1] 
https://syzkaller.appspot.com/bug?id=ea8c7912757d253537375e981b61749b2da69258

Signed-off-by: Tetsuo Handa 
Reported-by: syzbot 
Cc: Michal Hocko 
Cc: Oleg Nesterov 
Cc: Vladimir Davydov 
Cc: David Rientjes 
---
 include/linux/oom.h | 3 +++
 mm/oom_kill.c   | 8 
 mm/page_alloc.c | 7 +--
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 69864a5..b5abacd 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -38,6 +38,9 @@ struct oom_control {
 */
const int order;
 
+   /* Did we already try ALLOC_OOM allocation? i*/
+   const bool reserve_tried;
+
/* Used by oom implementation, do not set */
unsigned long totalpages;
struct task_struct *chosen;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 0e10b86..95453e8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -782,7 +782,7 @@ static inline bool __task_will_free_mem(struct task_struct 
*task)
  * Caller has to make sure that task->mm is stable (hold task_lock or
  * it operates on the current).
  */
-static bool task_will_free_mem(struct task_struct *task)
+static bool task_will_free_mem(struct task_struct *task, bool select_new)
 {
struct mm_struct *mm = task->mm;
struct task_struct *p;
@@ -803,7 +803,7 @@ static bool task_will_free_mem(struct task_struct *task)
 * This task has already been drained by the oom reaper so there are
 * only small chances it will free some more
 */
-   if (test_bit(MMF_OOM_SKIP, >flags))
+   if (test_bit(MMF_OOM_SKIP, >flags) && select_new)
return false;
 
if (atomic_read(>mm_users) <= 1)
@@ -939,7 +939,7 @@ static void oom_kill_process(struct oom_control *oc, const 
char *message)
 * so it can die quickly
 */
task_lock(p);
-   if (task_will_free_mem(p)) {
+   if (task_will_free_mem(p, true)) {
mark_oom_victim(p);
wake_oom_reaper(p);
task_unlock(p);
@@ -1069,7 +1069,7 @@ bool out_of_memory(struct oom_control *oc)
 * select it.  The goal is to allow it to allocate so that it may
 * quickly exit and free its memory.
 */
-   if (task_will_free_mem(current)) {
+   if (task_will_free_mem(current, oc->reserve_tried)) {
mark_oom_victim(current);
wake_oom_reaper(current);
return true;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 879b861..03ca29a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3455,7 +3455,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, 
const char *fmt, ...)
 }
 
 static inline struct page *
-__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
+__alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order, bool reserve_tried,
const struct alloc_context *ac, unsigned long *did_some_progress)
 {
struct oom_control oc = {
@@ -3464,6 +3464,7 @@ void warn_alloc(gfp_t gfp_mask, nodemask_t *nodemask, 
const char *fmt, ...)
.memcg = NULL,
.gfp_mask = gfp_mask,
.order = order,
+   .reserve_tried = reserve_tried,
};
struct page *page;
 
@@ -4239,7 +4240,9 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
goto retry_cpuset;
 
/* Reclaim has failed us, start killing things */
-   page = __alloc_pages_may_oom(gfp_mask, order, ac, _some_progress);
+   page = __alloc_pages_may_oom(gfp_mask, order, alloc_flags == ALLOC_OOM
+|| (gfp_mask & __GFP_NOMEMALLOC), ac,
+_some_progress);
if (page)
goto got_pg;
 
-- 
1.8.3.1




Re: WARNING in try_charge

2018-08-07 Thread Michal Hocko
On Tue 07-08-18 13:18:00, Dmitry Vyukov wrote:
[...]
> Great we are making progress here!
> 
> So if it's something to fix in kernel we just leave WARN alone. It
> served its intended purpose of notifying kernel developers about
> something to fix in kernel.

Yes, agreed! And the way how your syzbot automation works made it so
much easier so thanks a lot!

The patch has been posted 
http://lkml.kernel.org/r/20180807072553.14941-1-mho...@kernel.org
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-07 Thread Michal Hocko
On Tue 07-08-18 13:18:00, Dmitry Vyukov wrote:
[...]
> Great we are making progress here!
> 
> So if it's something to fix in kernel we just leave WARN alone. It
> served its intended purpose of notifying kernel developers about
> something to fix in kernel.

Yes, agreed! And the way how your syzbot automation works made it so
much easier so thanks a lot!

The patch has been posted 
http://lkml.kernel.org/r/20180807072553.14941-1-mho...@kernel.org
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-07 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 8:55 PM, Michal Hocko  wrote:
> The debugging patch was wrong but I guess I see it finally.
> It's a race
>
> : [   72.901666] Memory cgroup out of memory: Kill process 6584 
> (syz-executor1) score 55 or sacrifice child
> : [   72.917037] Killed process 6584 (syz-executor1) total-vm:37704kB, 
> anon-rss:2140kB, file-rss:0kB, shmem-rss:0kB
> : [   72.927256] task=syz-executor5 pid=6581 charge bypass
> : [   72.928046] oom_reaper: reaped process 6584 (syz-executor1), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> : [   72.932818] task=syz-executor6 pid=6576 invoked memcg oom killer. 
> oom_victim=1
> : [   72.942790] task=syz-executor5 pid=6581 charge for nr_pages=1
> : [   72.949769] syz-executor6 invoked oom-killer: 
> gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
> oom_score_adj=0
> : [   72.955606] task=syz-executor5 pid=6581 charge bypass
> : [   72.967394] syz-executor6 cpuset=/ mems_allowed=0
> : [   72.973175] task=syz-executor5 pid=6581 charge for nr_pages=1
> : [...]
> : [   73.534865] Task in /ile0 killed as a result of limit of /ile0
> : [   73.540865] memory: usage 76kB, limit 0kB, failcnt 260
> : [   73.546142] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   73.552898] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   73.559051] Memory cgroup stats for /ile0: cache:0KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
> : [   73.578533] Tasks state (memory values in pages):
> : [   73.583404] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [   73.592277] [   6569] 0  6562 94271532480
>  0 syz-executor0
> : [   73.601299] [   6576] 0  6576 94260614400
>  0 syz-executor6
> : [   73.610333] [   6578] 0  6578 9426  534614400
>  0 syz-executor4
> : [   73.619381] [   6579] 0  6579 94260573440
>  0 syz-executor5
> : [   73.628414] [   6582] 0  6582 94260614400
>  0 syz-executor7
> : [   73.637441] [   6584] 0  6584 94260573440
>  0 syz-executor1
> : [   73.646464] Memory cgroup out of memory: Kill process 6578 
> (syz-executor4) score 549000 or sacrifice child
> : [   73.656295] task=syz-executor6 pid=6576 is oom victim now
>
> This should be 6578 but we at least know that we are running in 6576
> context so the we are setting the state from a remote context which
> itself has been killed already
>
> : [   73.661841] Killed process 6578 (syz-executor4) total-vm:37704kB, 
> anon-rss:2136kB, file-rss:0kB, shmem-rss:0kB
> : [   73.672035] task=syz-executor6 pid=6576 charge bypass
> : [   73.672801] oom_reaper: reaped process 6578 (syz-executor4), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> : [   73.678829] task=syz-executor4 pid=6578 invoked memcg oom killer. 
> oom_victim=1
>
> and here the victim finally reached the oom path finally.
>
> : [   73.687453] task=syz-executor6 pid=6576 charge for nr_pages=1
> : [   73.694534] [ cut here ]
> : [   73.700424] task=syz-executor6 pid=6576 charge bypass
> : [   73.705175] Memory cgroup charge failed because of no reclaimable 
> memory! This looks like a misconfiguration or a kernel bug.
> : [   73.705321] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 
> try_charge+0xafa/0x1710
>
> But there is nobody killable. So the oom kill happened _after_ our force
> charge path. Therefore we should do the following regardless whether we
> make tis warn or pr_$foo

Great we are making progress here!

So if it's something to fix in kernel we just leave WARN alone. It
served its intended purpose of notifying kernel developers about
something to fix in kernel. And as you noted 0 is not actually special
in this context anyway. I misunderstood how exactly misconfiguration
is involved here.


> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> 116b181bb646afedd770985de20a68721bdb2648
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..1b6eed1bc404 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup 
> *memcg, gfp_t mask, int
> return OOM_ASYNC;
> }
>
> -   if (mem_cgroup_out_of_memory(memcg, mask, order))
> +   if (mem_cgroup_out_of_memory(memcg, mask, order) ||
> +   tsk_is_oom_victim(current))
> return OOM_SUCCESS;
>
> WARN(1,"Memory cgroup charge failed because of no reclaimable memory! 
> "
> --
> Michal Hocko
> SUSE Labs


Re: WARNING in try_charge

2018-08-07 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 8:55 PM, Michal Hocko  wrote:
> The debugging patch was wrong but I guess I see it finally.
> It's a race
>
> : [   72.901666] Memory cgroup out of memory: Kill process 6584 
> (syz-executor1) score 55 or sacrifice child
> : [   72.917037] Killed process 6584 (syz-executor1) total-vm:37704kB, 
> anon-rss:2140kB, file-rss:0kB, shmem-rss:0kB
> : [   72.927256] task=syz-executor5 pid=6581 charge bypass
> : [   72.928046] oom_reaper: reaped process 6584 (syz-executor1), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> : [   72.932818] task=syz-executor6 pid=6576 invoked memcg oom killer. 
> oom_victim=1
> : [   72.942790] task=syz-executor5 pid=6581 charge for nr_pages=1
> : [   72.949769] syz-executor6 invoked oom-killer: 
> gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
> oom_score_adj=0
> : [   72.955606] task=syz-executor5 pid=6581 charge bypass
> : [   72.967394] syz-executor6 cpuset=/ mems_allowed=0
> : [   72.973175] task=syz-executor5 pid=6581 charge for nr_pages=1
> : [...]
> : [   73.534865] Task in /ile0 killed as a result of limit of /ile0
> : [   73.540865] memory: usage 76kB, limit 0kB, failcnt 260
> : [   73.546142] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   73.552898] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   73.559051] Memory cgroup stats for /ile0: cache:0KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
> : [   73.578533] Tasks state (memory values in pages):
> : [   73.583404] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [   73.592277] [   6569] 0  6562 94271532480
>  0 syz-executor0
> : [   73.601299] [   6576] 0  6576 94260614400
>  0 syz-executor6
> : [   73.610333] [   6578] 0  6578 9426  534614400
>  0 syz-executor4
> : [   73.619381] [   6579] 0  6579 94260573440
>  0 syz-executor5
> : [   73.628414] [   6582] 0  6582 94260614400
>  0 syz-executor7
> : [   73.637441] [   6584] 0  6584 94260573440
>  0 syz-executor1
> : [   73.646464] Memory cgroup out of memory: Kill process 6578 
> (syz-executor4) score 549000 or sacrifice child
> : [   73.656295] task=syz-executor6 pid=6576 is oom victim now
>
> This should be 6578 but we at least know that we are running in 6576
> context so the we are setting the state from a remote context which
> itself has been killed already
>
> : [   73.661841] Killed process 6578 (syz-executor4) total-vm:37704kB, 
> anon-rss:2136kB, file-rss:0kB, shmem-rss:0kB
> : [   73.672035] task=syz-executor6 pid=6576 charge bypass
> : [   73.672801] oom_reaper: reaped process 6578 (syz-executor4), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> : [   73.678829] task=syz-executor4 pid=6578 invoked memcg oom killer. 
> oom_victim=1
>
> and here the victim finally reached the oom path finally.
>
> : [   73.687453] task=syz-executor6 pid=6576 charge for nr_pages=1
> : [   73.694534] [ cut here ]
> : [   73.700424] task=syz-executor6 pid=6576 charge bypass
> : [   73.705175] Memory cgroup charge failed because of no reclaimable 
> memory! This looks like a misconfiguration or a kernel bug.
> : [   73.705321] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 
> try_charge+0xafa/0x1710
>
> But there is nobody killable. So the oom kill happened _after_ our force
> charge path. Therefore we should do the following regardless whether we
> make tis warn or pr_$foo

Great we are making progress here!

So if it's something to fix in kernel we just leave WARN alone. It
served its intended purpose of notifying kernel developers about
something to fix in kernel. And as you noted 0 is not actually special
in this context anyway. I misunderstood how exactly misconfiguration
is involved here.


> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> 116b181bb646afedd770985de20a68721bdb2648
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..1b6eed1bc404 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup 
> *memcg, gfp_t mask, int
> return OOM_ASYNC;
> }
>
> -   if (mem_cgroup_out_of_memory(memcg, mask, order))
> +   if (mem_cgroup_out_of_memory(memcg, mask, order) ||
> +   tsk_is_oom_victim(current))
> return OOM_SUCCESS;
>
> WARN(1,"Memory cgroup charge failed because of no reclaimable memory! 
> "
> --
> Michal Hocko
> SUSE Labs


Re: WARNING in try_charge

2018-08-07 Thread Tetsuo Handa
On 2018/08/07 6:50, Tetsuo Handa wrote:
>   list_for_each_entry_safe(p, tmp, _victim_list, oom_victim_list) {
>   struct mm_struct *mm = p->signal->oom_mm;
>  
>   if (oom_unkillable_task(p, oc->memcg, oc->nodemask))
>   continue;
>   ret = true;
> + /*
> +  * Since memcg OOM allows forced charge, we can safely wait
> +  * until __mmput() completes.
> +  */
> + if (is_memcg_oom(oc))
> + return true;

Oops. If this OOM victim was blocked on some lock which current thread is
holding, waiting forever unconditionally is not safe.

>  #ifdef CONFIG_MMU
>   /*
>* Since the OOM reaper exists, we can safely wait until
>* MMF_OOM_SKIP is set.
>*/
>   if (!test_bit(MMF_OOM_SKIP, >flags)) {
>   if (!oom_reap_target) {
>   get_task_struct(p);
>   oom_reap_target = p;
>   trace_wake_reaper(p->pid);
>   wake_up(_reaper_wait);
>   }
>  #endif
>   continue;
>   }
>  #endif
>   /* We can wait as long as OOM score is decreasing over time. */
>   if (!victim_mm_stalling(p, mm))
>   continue;
>   gaveup = true;
>   list_del(>oom_victim_list);
>   /* Drop a reference taken by mark_oom_victim(). */
>   put_task_struct(p);
>   }
>   if (gaveup)
>   debug_show_all_locks();
>  
>   return ret;
>  }
> 


Re: WARNING in try_charge

2018-08-07 Thread Tetsuo Handa
On 2018/08/07 6:50, Tetsuo Handa wrote:
>   list_for_each_entry_safe(p, tmp, _victim_list, oom_victim_list) {
>   struct mm_struct *mm = p->signal->oom_mm;
>  
>   if (oom_unkillable_task(p, oc->memcg, oc->nodemask))
>   continue;
>   ret = true;
> + /*
> +  * Since memcg OOM allows forced charge, we can safely wait
> +  * until __mmput() completes.
> +  */
> + if (is_memcg_oom(oc))
> + return true;

Oops. If this OOM victim was blocked on some lock which current thread is
holding, waiting forever unconditionally is not safe.

>  #ifdef CONFIG_MMU
>   /*
>* Since the OOM reaper exists, we can safely wait until
>* MMF_OOM_SKIP is set.
>*/
>   if (!test_bit(MMF_OOM_SKIP, >flags)) {
>   if (!oom_reap_target) {
>   get_task_struct(p);
>   oom_reap_target = p;
>   trace_wake_reaper(p->pid);
>   wake_up(_reaper_wait);
>   }
>  #endif
>   continue;
>   }
>  #endif
>   /* We can wait as long as OOM score is decreasing over time. */
>   if (!victim_mm_stalling(p, mm))
>   continue;
>   gaveup = true;
>   list_del(>oom_victim_list);
>   /* Drop a reference taken by mark_oom_victim(). */
>   put_task_struct(p);
>   }
>   if (gaveup)
>   debug_show_all_locks();
>  
>   return ret;
>  }
> 


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/07 5:55, Michal Hocko wrote:
> On Tue 07-08-18 05:46:04, Tetsuo Handa wrote:
>> On 2018/08/07 5:34, Michal Hocko wrote:
>>> On Tue 07-08-18 05:26:23, Tetsuo Handa wrote:
 On 2018/08/07 2:56, Michal Hocko wrote:
> So the oom victim indeed passed the above force path after the oom
> invocation. But later on hit the page fault path and that behaved
> differently and for some reason the force path hasn't triggered. I am
> wondering how could we hit the page fault path in the first place. The
> task is already killed! So what the hell is going on here.
>
> I must be missing something obvious here.
>
 YOU ARE OBVIOUSLY MISSING MY MAIL!

 I already said this is "mm, oom: task_will_free_mem(current) should ignore 
 MMF_OOM_SKIP for once."
 problem which you are refusing at 
 https://www.spinics.net/lists/linux-mm/msg133774.html .
 And you again ignored my mail. Very sad...
>>>
>>> Your suggestion simply didn't make much sense. There is nothing like
>>> first check is different from the rest.
>>>
>>
>> I don't think your patch is appropriate. It avoids hitting WARN(1) but does 
>> not avoid
>> unnecessary killing of OOM victims.
>>
>> If you look at 
>> https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will
>> notice that both 23766 and 23767 are killed due to 
>> task_will_free_mem(current) == false.
>> This is "unnecessary killing of additional processes".
> 
> Have you noticed the mere detail that the memcg has to kill any task
> attempting the charge because the hard limit is 0? There is simply no
> other way around. You cannot charge. There is no unnecessary killing.
> Full stop. We do allow temporary breach of the hard limit just to let
> the task die and uncharge on the way out.
> 

select_bad_process() is called just because
task_will_free_mem("already killed current thread which has not completed 
__mmput()") == false
is a bug. I'm saying that the OOM killer should not give up as soon as 
MMF_OOM_SKIP is set.

 static bool oom_has_pending_victims(struct oom_control *oc)
 {
struct task_struct *p, *tmp;
bool ret = false;
bool gaveup = false;
 
if (is_sysrq_oom(oc))
return false;
/*
 * Wait for pending victims until __mmput() completes or stalled
 * too long.
 */
list_for_each_entry_safe(p, tmp, _victim_list, oom_victim_list) {
struct mm_struct *mm = p->signal->oom_mm;
 
if (oom_unkillable_task(p, oc->memcg, oc->nodemask))
continue;
ret = true;
+   /*
+* Since memcg OOM allows forced charge, we can safely wait
+* until __mmput() completes.
+*/
+   if (is_memcg_oom(oc))
+   return true;
 #ifdef CONFIG_MMU
/*
 * Since the OOM reaper exists, we can safely wait until
 * MMF_OOM_SKIP is set.
 */
if (!test_bit(MMF_OOM_SKIP, >flags)) {
if (!oom_reap_target) {
get_task_struct(p);
oom_reap_target = p;
trace_wake_reaper(p->pid);
wake_up(_reaper_wait);
}
 #endif
continue;
}
 #endif
/* We can wait as long as OOM score is decreasing over time. */
if (!victim_mm_stalling(p, mm))
continue;
gaveup = true;
list_del(>oom_victim_list);
/* Drop a reference taken by mark_oom_victim(). */
put_task_struct(p);
}
if (gaveup)
debug_show_all_locks();
 
return ret;
 }



Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/07 5:55, Michal Hocko wrote:
> On Tue 07-08-18 05:46:04, Tetsuo Handa wrote:
>> On 2018/08/07 5:34, Michal Hocko wrote:
>>> On Tue 07-08-18 05:26:23, Tetsuo Handa wrote:
 On 2018/08/07 2:56, Michal Hocko wrote:
> So the oom victim indeed passed the above force path after the oom
> invocation. But later on hit the page fault path and that behaved
> differently and for some reason the force path hasn't triggered. I am
> wondering how could we hit the page fault path in the first place. The
> task is already killed! So what the hell is going on here.
>
> I must be missing something obvious here.
>
 YOU ARE OBVIOUSLY MISSING MY MAIL!

 I already said this is "mm, oom: task_will_free_mem(current) should ignore 
 MMF_OOM_SKIP for once."
 problem which you are refusing at 
 https://www.spinics.net/lists/linux-mm/msg133774.html .
 And you again ignored my mail. Very sad...
>>>
>>> Your suggestion simply didn't make much sense. There is nothing like
>>> first check is different from the rest.
>>>
>>
>> I don't think your patch is appropriate. It avoids hitting WARN(1) but does 
>> not avoid
>> unnecessary killing of OOM victims.
>>
>> If you look at 
>> https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will
>> notice that both 23766 and 23767 are killed due to 
>> task_will_free_mem(current) == false.
>> This is "unnecessary killing of additional processes".
> 
> Have you noticed the mere detail that the memcg has to kill any task
> attempting the charge because the hard limit is 0? There is simply no
> other way around. You cannot charge. There is no unnecessary killing.
> Full stop. We do allow temporary breach of the hard limit just to let
> the task die and uncharge on the way out.
> 

select_bad_process() is called just because
task_will_free_mem("already killed current thread which has not completed 
__mmput()") == false
is a bug. I'm saying that the OOM killer should not give up as soon as 
MMF_OOM_SKIP is set.

 static bool oom_has_pending_victims(struct oom_control *oc)
 {
struct task_struct *p, *tmp;
bool ret = false;
bool gaveup = false;
 
if (is_sysrq_oom(oc))
return false;
/*
 * Wait for pending victims until __mmput() completes or stalled
 * too long.
 */
list_for_each_entry_safe(p, tmp, _victim_list, oom_victim_list) {
struct mm_struct *mm = p->signal->oom_mm;
 
if (oom_unkillable_task(p, oc->memcg, oc->nodemask))
continue;
ret = true;
+   /*
+* Since memcg OOM allows forced charge, we can safely wait
+* until __mmput() completes.
+*/
+   if (is_memcg_oom(oc))
+   return true;
 #ifdef CONFIG_MMU
/*
 * Since the OOM reaper exists, we can safely wait until
 * MMF_OOM_SKIP is set.
 */
if (!test_bit(MMF_OOM_SKIP, >flags)) {
if (!oom_reap_target) {
get_task_struct(p);
oom_reap_target = p;
trace_wake_reaper(p->pid);
wake_up(_reaper_wait);
}
 #endif
continue;
}
 #endif
/* We can wait as long as OOM score is decreasing over time. */
if (!victim_mm_stalling(p, mm))
continue;
gaveup = true;
list_del(>oom_victim_list);
/* Drop a reference taken by mark_oom_victim(). */
put_task_struct(p);
}
if (gaveup)
debug_show_all_locks();
 
return ret;
 }



Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Tue 07-08-18 05:46:04, Tetsuo Handa wrote:
> On 2018/08/07 5:34, Michal Hocko wrote:
> > On Tue 07-08-18 05:26:23, Tetsuo Handa wrote:
> >> On 2018/08/07 2:56, Michal Hocko wrote:
> >>> So the oom victim indeed passed the above force path after the oom
> >>> invocation. But later on hit the page fault path and that behaved
> >>> differently and for some reason the force path hasn't triggered. I am
> >>> wondering how could we hit the page fault path in the first place. The
> >>> task is already killed! So what the hell is going on here.
> >>>
> >>> I must be missing something obvious here.
> >>>
> >> YOU ARE OBVIOUSLY MISSING MY MAIL!
> >>
> >> I already said this is "mm, oom: task_will_free_mem(current) should ignore 
> >> MMF_OOM_SKIP for once."
> >> problem which you are refusing at 
> >> https://www.spinics.net/lists/linux-mm/msg133774.html .
> >> And you again ignored my mail. Very sad...
> > 
> > Your suggestion simply didn't make much sense. There is nothing like
> > first check is different from the rest.
> > 
> 
> I don't think your patch is appropriate. It avoids hitting WARN(1) but does 
> not avoid
> unnecessary killing of OOM victims.
> 
> If you look at 
> https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will
> notice that both 23766 and 23767 are killed due to 
> task_will_free_mem(current) == false.
> This is "unnecessary killing of additional processes".

Have you noticed the mere detail that the memcg has to kill any task
attempting the charge because the hard limit is 0? There is simply no
other way around. You cannot charge. There is no unnecessary killing.
Full stop. We do allow temporary breach of the hard limit just to let
the task die and uncharge on the way out.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Tue 07-08-18 05:46:04, Tetsuo Handa wrote:
> On 2018/08/07 5:34, Michal Hocko wrote:
> > On Tue 07-08-18 05:26:23, Tetsuo Handa wrote:
> >> On 2018/08/07 2:56, Michal Hocko wrote:
> >>> So the oom victim indeed passed the above force path after the oom
> >>> invocation. But later on hit the page fault path and that behaved
> >>> differently and for some reason the force path hasn't triggered. I am
> >>> wondering how could we hit the page fault path in the first place. The
> >>> task is already killed! So what the hell is going on here.
> >>>
> >>> I must be missing something obvious here.
> >>>
> >> YOU ARE OBVIOUSLY MISSING MY MAIL!
> >>
> >> I already said this is "mm, oom: task_will_free_mem(current) should ignore 
> >> MMF_OOM_SKIP for once."
> >> problem which you are refusing at 
> >> https://www.spinics.net/lists/linux-mm/msg133774.html .
> >> And you again ignored my mail. Very sad...
> > 
> > Your suggestion simply didn't make much sense. There is nothing like
> > first check is different from the rest.
> > 
> 
> I don't think your patch is appropriate. It avoids hitting WARN(1) but does 
> not avoid
> unnecessary killing of OOM victims.
> 
> If you look at 
> https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 , you will
> notice that both 23766 and 23767 are killed due to 
> task_will_free_mem(current) == false.
> This is "unnecessary killing of additional processes".

Have you noticed the mere detail that the memcg has to kill any task
attempting the charge because the hard limit is 0? There is simply no
other way around. You cannot charge. There is no unnecessary killing.
Full stop. We do allow temporary breach of the hard limit just to let
the task die and uncharge on the way out.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/07 5:34, Michal Hocko wrote:
> On Tue 07-08-18 05:26:23, Tetsuo Handa wrote:
>> On 2018/08/07 2:56, Michal Hocko wrote:
>>> So the oom victim indeed passed the above force path after the oom
>>> invocation. But later on hit the page fault path and that behaved
>>> differently and for some reason the force path hasn't triggered. I am
>>> wondering how could we hit the page fault path in the first place. The
>>> task is already killed! So what the hell is going on here.
>>>
>>> I must be missing something obvious here.
>>>
>> YOU ARE OBVIOUSLY MISSING MY MAIL!
>>
>> I already said this is "mm, oom: task_will_free_mem(current) should ignore 
>> MMF_OOM_SKIP for once."
>> problem which you are refusing at 
>> https://www.spinics.net/lists/linux-mm/msg133774.html .
>> And you again ignored my mail. Very sad...
> 
> Your suggestion simply didn't make much sense. There is nothing like
> first check is different from the rest.
> 

I don't think your patch is appropriate. It avoids hitting WARN(1) but does not 
avoid
unnecessary killing of OOM victims.

If you look at https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 
, you will
notice that both 23766 and 23767 are killed due to task_will_free_mem(current) 
== false.
This is "unnecessary killing of additional processes".

[  365.869417] syz-executor2 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), 
order=0, oom_score_adj=0
[  365.878899] CPU: 0 PID: 23767 Comm: syz-executor2 Not tainted 
4.18.0-rc6-next-20180725+ #18
(...snipped...)
[  366.487490] Tasks state (memory values in pages):
[  366.492349] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes swapents 
oom_score_adj name
[  366.501237] [  23766] 0 2376617620 8221   1269760
 0 syz-executor3
[  366.510367] [  23767] 0 2376717618 8218   1269760
 0 syz-executor2
[  366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) 
score 8252000 or sacrifice child
[  366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, 
anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB
[  366.540456] oom_reaper: reaped process 23766 (syz-executor3), now 
anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
[  366.550949] syz-executor3 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), 
order=0, oom_score_adj=0
[  366.560374] CPU: 1 PID: 23766 Comm: syz-executor3 Not tainted 
4.18.0-rc6-next-20180725+ #18
(...snipped...)
[  367.138136] Tasks state (memory values in pages):
[  367.142986] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes swapents 
oom_score_adj name
[  367.151889] [  23766] 0 2376617620 8002   1269760
 0 syz-executor3
[  367.160946] [  23767] 0 2376717618 8218   1269760
 0 syz-executor2
[  367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) 
score 8249000 or sacrifice child
[  367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, 
anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB
[  367.192101] oom_reaper: reaped process 23767 (syz-executor2), now 
anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
[  367.202986] [ cut here ]
[  367.207845] Memory cgroup charge failed because of no reclaimable memory! 
This looks like a misconfiguration or a kernel bug.
[  367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 
try_charge+0x734/0x1680
[  367.227540] Kernel panic - not syncing: panic_on_warn set ...



Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/07 5:34, Michal Hocko wrote:
> On Tue 07-08-18 05:26:23, Tetsuo Handa wrote:
>> On 2018/08/07 2:56, Michal Hocko wrote:
>>> So the oom victim indeed passed the above force path after the oom
>>> invocation. But later on hit the page fault path and that behaved
>>> differently and for some reason the force path hasn't triggered. I am
>>> wondering how could we hit the page fault path in the first place. The
>>> task is already killed! So what the hell is going on here.
>>>
>>> I must be missing something obvious here.
>>>
>> YOU ARE OBVIOUSLY MISSING MY MAIL!
>>
>> I already said this is "mm, oom: task_will_free_mem(current) should ignore 
>> MMF_OOM_SKIP for once."
>> problem which you are refusing at 
>> https://www.spinics.net/lists/linux-mm/msg133774.html .
>> And you again ignored my mail. Very sad...
> 
> Your suggestion simply didn't make much sense. There is nothing like
> first check is different from the rest.
> 

I don't think your patch is appropriate. It avoids hitting WARN(1) but does not 
avoid
unnecessary killing of OOM victims.

If you look at https://syzkaller.appspot.com/text?tag=CrashLog=15a1c77040 
, you will
notice that both 23766 and 23767 are killed due to task_will_free_mem(current) 
== false.
This is "unnecessary killing of additional processes".

[  365.869417] syz-executor2 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), 
order=0, oom_score_adj=0
[  365.878899] CPU: 0 PID: 23767 Comm: syz-executor2 Not tainted 
4.18.0-rc6-next-20180725+ #18
(...snipped...)
[  366.487490] Tasks state (memory values in pages):
[  366.492349] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes swapents 
oom_score_adj name
[  366.501237] [  23766] 0 2376617620 8221   1269760
 0 syz-executor3
[  366.510367] [  23767] 0 2376717618 8218   1269760
 0 syz-executor2
[  366.519409] Memory cgroup out of memory: Kill process 23766 (syz-executor3) 
score 8252000 or sacrifice child
[  366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, 
anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB
[  366.540456] oom_reaper: reaped process 23766 (syz-executor3), now 
anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
[  366.550949] syz-executor3 invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), 
order=0, oom_score_adj=0
[  366.560374] CPU: 1 PID: 23766 Comm: syz-executor3 Not tainted 
4.18.0-rc6-next-20180725+ #18
(...snipped...)
[  367.138136] Tasks state (memory values in pages):
[  367.142986] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes swapents 
oom_score_adj name
[  367.151889] [  23766] 0 2376617620 8002   1269760
 0 syz-executor3
[  367.160946] [  23767] 0 2376717618 8218   1269760
 0 syz-executor2
[  367.169994] Memory cgroup out of memory: Kill process 23767 (syz-executor2) 
score 8249000 or sacrifice child
[  367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, 
anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB
[  367.192101] oom_reaper: reaped process 23767 (syz-executor2), now 
anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
[  367.202986] [ cut here ]
[  367.207845] Memory cgroup charge failed because of no reclaimable memory! 
This looks like a misconfiguration or a kernel bug.
[  367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 
try_charge+0x734/0x1680
[  367.227540] Kernel panic - not syncing: panic_on_warn set ...



Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Tue 07-08-18 05:26:23, Tetsuo Handa wrote:
> On 2018/08/07 2:56, Michal Hocko wrote:
> > So the oom victim indeed passed the above force path after the oom
> > invocation. But later on hit the page fault path and that behaved
> > differently and for some reason the force path hasn't triggered. I am
> > wondering how could we hit the page fault path in the first place. The
> > task is already killed! So what the hell is going on here.
> > 
> > I must be missing something obvious here.
> > 
> YOU ARE OBVIOUSLY MISSING MY MAIL!
> 
> I already said this is "mm, oom: task_will_free_mem(current) should ignore 
> MMF_OOM_SKIP for once."
> problem which you are refusing at 
> https://www.spinics.net/lists/linux-mm/msg133774.html .
> And you again ignored my mail. Very sad...

Your suggestion simply didn't make much sense. There is nothing like
first check is different from the rest.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Tue 07-08-18 05:26:23, Tetsuo Handa wrote:
> On 2018/08/07 2:56, Michal Hocko wrote:
> > So the oom victim indeed passed the above force path after the oom
> > invocation. But later on hit the page fault path and that behaved
> > differently and for some reason the force path hasn't triggered. I am
> > wondering how could we hit the page fault path in the first place. The
> > task is already killed! So what the hell is going on here.
> > 
> > I must be missing something obvious here.
> > 
> YOU ARE OBVIOUSLY MISSING MY MAIL!
> 
> I already said this is "mm, oom: task_will_free_mem(current) should ignore 
> MMF_OOM_SKIP for once."
> problem which you are refusing at 
> https://www.spinics.net/lists/linux-mm/msg133774.html .
> And you again ignored my mail. Very sad...

Your suggestion simply didn't make much sense. There is nothing like
first check is different from the rest.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/07 2:56, Michal Hocko wrote:
> So the oom victim indeed passed the above force path after the oom
> invocation. But later on hit the page fault path and that behaved
> differently and for some reason the force path hasn't triggered. I am
> wondering how could we hit the page fault path in the first place. The
> task is already killed! So what the hell is going on here.
> 
> I must be missing something obvious here.
> 
YOU ARE OBVIOUSLY MISSING MY MAIL!

I already said this is "mm, oom: task_will_free_mem(current) should ignore 
MMF_OOM_SKIP for once."
problem which you are refusing at 
https://www.spinics.net/lists/linux-mm/msg133774.html .
And you again ignored my mail. Very sad...


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/07 2:56, Michal Hocko wrote:
> So the oom victim indeed passed the above force path after the oom
> invocation. But later on hit the page fault path and that behaved
> differently and for some reason the force path hasn't triggered. I am
> wondering how could we hit the page fault path in the first place. The
> task is already killed! So what the hell is going on here.
> 
> I must be missing something obvious here.
> 
YOU ARE OBVIOUSLY MISSING MY MAIL!

I already said this is "mm, oom: task_will_free_mem(current) should ignore 
MMF_OOM_SKIP for once."
problem which you are refusing at 
https://www.spinics.net/lists/linux-mm/msg133774.html .
And you again ignored my mail. Very sad...


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 21:45:53, Michal Hocko wrote:
> [CCing Greg - the email thread starts here
> http://lkml.kernel.org/r/5e979605729c1...@google.com]

now for real

> 
> On Mon 06-08-18 12:12:02, syzbot wrote:
> > Hello,
> > 
> > syzbot has tested the proposed patch and the reproducer did not trigger
> > crash:
> 
> OK, this is reassuring. Btw Greg has pointed out this potential case
> http://lkml.kernel.org/r/xr93in62jy8k@gthelen.svl.corp.google.com
> but I simply didn't get what he meant. He was suggesting MMF_OOM_SKIP
> but I didn't get why that matters. I didn't think about a race.
> 
> So how about this patch:
> From 74d980f8d066d06ada657ebf9b586dbf5668ed26 Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Mon, 6 Aug 2018 21:21:24 +0200
> Subject: [PATCH] memcg, oom: be careful about races when warning about no
>  reclaimable task
> 
> "memcg, oom: move out_of_memory back to the charge path" has added a
> warning triggered when the oom killer cannot find any eligible task
> and so there is no way to reclaim the oom memcg under its hard limit.
> Further charges for such a memcg are forced and therefore the hard limit
> isolation is weakened.
> 
> The current warning is however too eager to trigger  even when we are not
> really hitting the above condition. Syzbot and Greg Thelen have noticed
> that we can hit this condition even when there is still oom victim
> pending. E.g. the following race is possible:
> 
> memcg has two tasks taskA, taskB.
> 
> CPU1 (taskA)  CPU2CPU3 (taskB)
> try_charge
>   mem_cgroup_out_of_memorytry_charge
>   select_bad_process(taskB)
>   oom_kill_processoom_reap_task
>   # No real memory reaped
> 
> mem_cgroup_out_of_memory
>   # set taskB -> MMF_OOM_SKIP
>   # retry charge
>   mem_cgroup_out_of_memory
> oom_lock  oom_lock
> select_bad_process(self)
> oom_kill_process(self)
> oom_unlock
>   # no eligible task
> 
> In fact syzbot test triggered this situation by placing multiple tasks
> into a memcg with hard limit set to 0. So no task really had any memory
> charged to the memcg
> 
> : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB 
> mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
> active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
> : Tasks state (memory values in pages):
> : [  pid  ]   uid  tgid total_vm  rss pgtables_bytes swapents 
> oom_score_adj name
> : [   6569] 0  6562 94271532480 0 
> syz-executor0
> : [   6576] 0  6576 94260614400 0 
> syz-executor6
> : [   6578] 0  6578 9426  534614400 0 
> syz-executor4
> : [   6579] 0  6579 94260573440 0 
> syz-executor5
> : [   6582] 0  6582 94260614400 0 
> syz-executor7
> : [   6584] 0  6584 94260573440 0 
> syz-executor1
> 
> so in principle there is indeed nothing reclaimable in this memcg and
> this looks like a misconfiguration. On the other hand we can clearly
> kill all those tasks so it is a bit early to warn and scare users. Do
> that by checking that the current is the oom victim and bypass the
> warning then. The victim is allowed to force charge and terminate to
> release its temporal charge along the way.
> 
> Fixes: "memcg, oom: move out_of_memory back to the charge path"
> Noticed-by: Greg Thelen 
> Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
> Signed-off-by: Michal Hocko 
> ---
>  mm/memcontrol.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..1b6eed1bc404 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup 
> *memcg, gfp_t mask, int
>   return OOM_ASYNC;
>   }
>  
> - if (mem_cgroup_out_of_memory(memcg, mask, order))
> + if (mem_cgroup_out_of_memory(memcg, mask, order) ||
> + tsk_is_oom_victim(current))
>   return OOM_SUCCESS;
>  
>   WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
> -- 
> 2.18.0
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 21:45:53, Michal Hocko wrote:
> [CCing Greg - the email thread starts here
> http://lkml.kernel.org/r/5e979605729c1...@google.com]

now for real

> 
> On Mon 06-08-18 12:12:02, syzbot wrote:
> > Hello,
> > 
> > syzbot has tested the proposed patch and the reproducer did not trigger
> > crash:
> 
> OK, this is reassuring. Btw Greg has pointed out this potential case
> http://lkml.kernel.org/r/xr93in62jy8k@gthelen.svl.corp.google.com
> but I simply didn't get what he meant. He was suggesting MMF_OOM_SKIP
> but I didn't get why that matters. I didn't think about a race.
> 
> So how about this patch:
> From 74d980f8d066d06ada657ebf9b586dbf5668ed26 Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Mon, 6 Aug 2018 21:21:24 +0200
> Subject: [PATCH] memcg, oom: be careful about races when warning about no
>  reclaimable task
> 
> "memcg, oom: move out_of_memory back to the charge path" has added a
> warning triggered when the oom killer cannot find any eligible task
> and so there is no way to reclaim the oom memcg under its hard limit.
> Further charges for such a memcg are forced and therefore the hard limit
> isolation is weakened.
> 
> The current warning is however too eager to trigger  even when we are not
> really hitting the above condition. Syzbot and Greg Thelen have noticed
> that we can hit this condition even when there is still oom victim
> pending. E.g. the following race is possible:
> 
> memcg has two tasks taskA, taskB.
> 
> CPU1 (taskA)  CPU2CPU3 (taskB)
> try_charge
>   mem_cgroup_out_of_memorytry_charge
>   select_bad_process(taskB)
>   oom_kill_processoom_reap_task
>   # No real memory reaped
> 
> mem_cgroup_out_of_memory
>   # set taskB -> MMF_OOM_SKIP
>   # retry charge
>   mem_cgroup_out_of_memory
> oom_lock  oom_lock
> select_bad_process(self)
> oom_kill_process(self)
> oom_unlock
>   # no eligible task
> 
> In fact syzbot test triggered this situation by placing multiple tasks
> into a memcg with hard limit set to 0. So no task really had any memory
> charged to the memcg
> 
> : Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB 
> mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
> active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
> : Tasks state (memory values in pages):
> : [  pid  ]   uid  tgid total_vm  rss pgtables_bytes swapents 
> oom_score_adj name
> : [   6569] 0  6562 94271532480 0 
> syz-executor0
> : [   6576] 0  6576 94260614400 0 
> syz-executor6
> : [   6578] 0  6578 9426  534614400 0 
> syz-executor4
> : [   6579] 0  6579 94260573440 0 
> syz-executor5
> : [   6582] 0  6582 94260614400 0 
> syz-executor7
> : [   6584] 0  6584 94260573440 0 
> syz-executor1
> 
> so in principle there is indeed nothing reclaimable in this memcg and
> this looks like a misconfiguration. On the other hand we can clearly
> kill all those tasks so it is a bit early to warn and scare users. Do
> that by checking that the current is the oom victim and bypass the
> warning then. The victim is allowed to force charge and terminate to
> release its temporal charge along the way.
> 
> Fixes: "memcg, oom: move out_of_memory back to the charge path"
> Noticed-by: Greg Thelen 
> Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
> Signed-off-by: Michal Hocko 
> ---
>  mm/memcontrol.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..1b6eed1bc404 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup 
> *memcg, gfp_t mask, int
>   return OOM_ASYNC;
>   }
>  
> - if (mem_cgroup_out_of_memory(memcg, mask, order))
> + if (mem_cgroup_out_of_memory(memcg, mask, order) ||
> + tsk_is_oom_victim(current))
>   return OOM_SUCCESS;
>  
>   WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
> -- 
> 2.18.0
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
[CCing Greg - the email thread starts here
http://lkml.kernel.org/r/5e979605729c1...@google.com]

On Mon 06-08-18 12:12:02, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch and the reproducer did not trigger
> crash:

OK, this is reassuring. Btw Greg has pointed out this potential case
http://lkml.kernel.org/r/xr93in62jy8k@gthelen.svl.corp.google.com
but I simply didn't get what he meant. He was suggesting MMF_OOM_SKIP
but I didn't get why that matters. I didn't think about a race.

So how about this patch:
>From 74d980f8d066d06ada657ebf9b586dbf5668ed26 Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Mon, 6 Aug 2018 21:21:24 +0200
Subject: [PATCH] memcg, oom: be careful about races when warning about no
 reclaimable task

"memcg, oom: move out_of_memory back to the charge path" has added a
warning triggered when the oom killer cannot find any eligible task
and so there is no way to reclaim the oom memcg under its hard limit.
Further charges for such a memcg are forced and therefore the hard limit
isolation is weakened.

The current warning is however too eager to trigger  even when we are not
really hitting the above condition. Syzbot and Greg Thelen have noticed
that we can hit this condition even when there is still oom victim
pending. E.g. the following race is possible:

memcg has two tasks taskA, taskB.

CPU1 (taskA)CPU2CPU3 (taskB)
try_charge
  mem_cgroup_out_of_memory  try_charge
  select_bad_process(taskB)
  oom_kill_process  oom_reap_task
# No real memory reaped
  
mem_cgroup_out_of_memory
# set taskB -> MMF_OOM_SKIP
  # retry charge
  mem_cgroup_out_of_memory
oom_lockoom_lock
select_bad_process(self)
oom_kill_process(self)
oom_unlock
# no eligible task

In fact syzbot test triggered this situation by placing multiple tasks
into a memcg with hard limit set to 0. So no task really had any memory
charged to the memcg

: Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB 
mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
: Tasks state (memory values in pages):
: [  pid  ]   uid  tgid total_vm  rss pgtables_bytes swapents oom_score_adj 
name
: [   6569] 0  6562 94271532480 0 
syz-executor0
: [   6576] 0  6576 94260614400 0 
syz-executor6
: [   6578] 0  6578 9426  534614400 0 
syz-executor4
: [   6579] 0  6579 94260573440 0 
syz-executor5
: [   6582] 0  6582 94260614400 0 
syz-executor7
: [   6584] 0  6584 94260573440 0 
syz-executor1

so in principle there is indeed nothing reclaimable in this memcg and
this looks like a misconfiguration. On the other hand we can clearly
kill all those tasks so it is a bit early to warn and scare users. Do
that by checking that the current is the oom victim and bypass the
warning then. The victim is allowed to force charge and terminate to
release its temporal charge along the way.

Fixes: "memcg, oom: move out_of_memory back to the charge path"
Noticed-by: Greg Thelen 
Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
Signed-off-by: Michal Hocko 
---
 mm/memcontrol.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..1b6eed1bc404 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup 
*memcg, gfp_t mask, int
return OOM_ASYNC;
}
 
-   if (mem_cgroup_out_of_memory(memcg, mask, order))
+   if (mem_cgroup_out_of_memory(memcg, mask, order) ||
+   tsk_is_oom_victim(current))
return OOM_SUCCESS;
 
WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
-- 
2.18.0

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
[CCing Greg - the email thread starts here
http://lkml.kernel.org/r/5e979605729c1...@google.com]

On Mon 06-08-18 12:12:02, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch and the reproducer did not trigger
> crash:

OK, this is reassuring. Btw Greg has pointed out this potential case
http://lkml.kernel.org/r/xr93in62jy8k@gthelen.svl.corp.google.com
but I simply didn't get what he meant. He was suggesting MMF_OOM_SKIP
but I didn't get why that matters. I didn't think about a race.

So how about this patch:
>From 74d980f8d066d06ada657ebf9b586dbf5668ed26 Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Mon, 6 Aug 2018 21:21:24 +0200
Subject: [PATCH] memcg, oom: be careful about races when warning about no
 reclaimable task

"memcg, oom: move out_of_memory back to the charge path" has added a
warning triggered when the oom killer cannot find any eligible task
and so there is no way to reclaim the oom memcg under its hard limit.
Further charges for such a memcg are forced and therefore the hard limit
isolation is weakened.

The current warning is however too eager to trigger  even when we are not
really hitting the above condition. Syzbot and Greg Thelen have noticed
that we can hit this condition even when there is still oom victim
pending. E.g. the following race is possible:

memcg has two tasks taskA, taskB.

CPU1 (taskA)CPU2CPU3 (taskB)
try_charge
  mem_cgroup_out_of_memory  try_charge
  select_bad_process(taskB)
  oom_kill_process  oom_reap_task
# No real memory reaped
  
mem_cgroup_out_of_memory
# set taskB -> MMF_OOM_SKIP
  # retry charge
  mem_cgroup_out_of_memory
oom_lockoom_lock
select_bad_process(self)
oom_kill_process(self)
oom_unlock
# no eligible task

In fact syzbot test triggered this situation by placing multiple tasks
into a memcg with hard limit set to 0. So no task really had any memory
charged to the memcg

: Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB shmem:0KB 
mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
: Tasks state (memory values in pages):
: [  pid  ]   uid  tgid total_vm  rss pgtables_bytes swapents oom_score_adj 
name
: [   6569] 0  6562 94271532480 0 
syz-executor0
: [   6576] 0  6576 94260614400 0 
syz-executor6
: [   6578] 0  6578 9426  534614400 0 
syz-executor4
: [   6579] 0  6579 94260573440 0 
syz-executor5
: [   6582] 0  6582 94260614400 0 
syz-executor7
: [   6584] 0  6584 94260573440 0 
syz-executor1

so in principle there is indeed nothing reclaimable in this memcg and
this looks like a misconfiguration. On the other hand we can clearly
kill all those tasks so it is a bit early to warn and scare users. Do
that by checking that the current is the oom victim and bypass the
warning then. The victim is allowed to force charge and terminate to
release its temporal charge along the way.

Fixes: "memcg, oom: move out_of_memory back to the charge path"
Noticed-by: Greg Thelen 
Reported-and-tested-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
Signed-off-by: Michal Hocko 
---
 mm/memcontrol.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..1b6eed1bc404 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup 
*memcg, gfp_t mask, int
return OOM_ASYNC;
}
 
-   if (mem_cgroup_out_of_memory(memcg, mask, order))
+   if (mem_cgroup_out_of_memory(memcg, mask, order) ||
+   tsk_is_oom_victim(current))
return OOM_SUCCESS;
 
WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
-- 
2.18.0

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:


Reported-and-tested-by:  
syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com


Tested on:

commit: 116b181bb646 Add linux-next specific files for 20180803
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

kernel config:  https://syzkaller.appspot.com/x/.config?x=b4f38be7c2c519d5
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  https://syzkaller.appspot.com/x/patch.diff?x=147b3a7240

Note: testing is done by a robot and is best-effort only.


Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:


Reported-and-tested-by:  
syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com


Tested on:

commit: 116b181bb646 Add linux-next specific files for 20180803
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

kernel config:  https://syzkaller.appspot.com/x/.config?x=b4f38be7c2c519d5
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  https://syzkaller.appspot.com/x/patch.diff?x=147b3a7240

Note: testing is done by a robot and is best-effort only.


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
The debugging patch was wrong but I guess I see it finally.
It's a race

: [   72.901666] Memory cgroup out of memory: Kill process 6584 (syz-executor1) 
score 55 or sacrifice child
: [   72.917037] Killed process 6584 (syz-executor1) total-vm:37704kB, 
anon-rss:2140kB, file-rss:0kB, shmem-rss:0kB
: [   72.927256] task=syz-executor5 pid=6581 charge bypass
: [   72.928046] oom_reaper: reaped process 6584 (syz-executor1), now 
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
: [   72.932818] task=syz-executor6 pid=6576 invoked memcg oom killer. 
oom_victim=1
: [   72.942790] task=syz-executor5 pid=6581 charge for nr_pages=1
: [   72.949769] syz-executor6 invoked oom-killer: 
gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
oom_score_adj=0
: [   72.955606] task=syz-executor5 pid=6581 charge bypass
: [   72.967394] syz-executor6 cpuset=/ mems_allowed=0
: [   72.973175] task=syz-executor5 pid=6581 charge for nr_pages=1
: [...]
: [   73.534865] Task in /ile0 killed as a result of limit of /ile0
: [   73.540865] memory: usage 76kB, limit 0kB, failcnt 260
: [   73.546142] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
: [   73.552898] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
: [   73.559051] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB 
shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
: [   73.578533] Tasks state (memory values in pages):
: [   73.583404] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
swapents oom_score_adj name
: [   73.592277] [   6569] 0  6562 94271532480  
   0 syz-executor0
: [   73.601299] [   6576] 0  6576 94260614400  
   0 syz-executor6
: [   73.610333] [   6578] 0  6578 9426  534614400  
   0 syz-executor4
: [   73.619381] [   6579] 0  6579 94260573440  
   0 syz-executor5
: [   73.628414] [   6582] 0  6582 94260614400  
   0 syz-executor7
: [   73.637441] [   6584] 0  6584 94260573440  
   0 syz-executor1
: [   73.646464] Memory cgroup out of memory: Kill process 6578 (syz-executor4) 
score 549000 or sacrifice child
: [   73.656295] task=syz-executor6 pid=6576 is oom victim now

This should be 6578 but we at least know that we are running in 6576
context so the we are setting the state from a remote context which
itself has been killed already

: [   73.661841] Killed process 6578 (syz-executor4) total-vm:37704kB, 
anon-rss:2136kB, file-rss:0kB, shmem-rss:0kB
: [   73.672035] task=syz-executor6 pid=6576 charge bypass
: [   73.672801] oom_reaper: reaped process 6578 (syz-executor4), now 
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
: [   73.678829] task=syz-executor4 pid=6578 invoked memcg oom killer. 
oom_victim=1

and here the victim finally reached the oom path finally.

: [   73.687453] task=syz-executor6 pid=6576 charge for nr_pages=1
: [   73.694534] [ cut here ]
: [   73.700424] task=syz-executor6 pid=6576 charge bypass
: [   73.705175] Memory cgroup charge failed because of no reclaimable memory! 
This looks like a misconfiguration or a kernel bug.
: [   73.705321] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 
try_charge+0xafa/0x1710

But there is nobody killable. So the oom kill happened _after_ our force
charge path. Therefore we should do the following regardless whether we
make tis warn or pr_$foo

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
116b181bb646afedd770985de20a68721bdb2648

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..1b6eed1bc404 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup 
*memcg, gfp_t mask, int
return OOM_ASYNC;
}
 
-   if (mem_cgroup_out_of_memory(memcg, mask, order))
+   if (mem_cgroup_out_of_memory(memcg, mask, order) ||
+   tsk_is_oom_victim(current))
return OOM_SUCCESS;
 
WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
The debugging patch was wrong but I guess I see it finally.
It's a race

: [   72.901666] Memory cgroup out of memory: Kill process 6584 (syz-executor1) 
score 55 or sacrifice child
: [   72.917037] Killed process 6584 (syz-executor1) total-vm:37704kB, 
anon-rss:2140kB, file-rss:0kB, shmem-rss:0kB
: [   72.927256] task=syz-executor5 pid=6581 charge bypass
: [   72.928046] oom_reaper: reaped process 6584 (syz-executor1), now 
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
: [   72.932818] task=syz-executor6 pid=6576 invoked memcg oom killer. 
oom_victim=1
: [   72.942790] task=syz-executor5 pid=6581 charge for nr_pages=1
: [   72.949769] syz-executor6 invoked oom-killer: 
gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, 
oom_score_adj=0
: [   72.955606] task=syz-executor5 pid=6581 charge bypass
: [   72.967394] syz-executor6 cpuset=/ mems_allowed=0
: [   72.973175] task=syz-executor5 pid=6581 charge for nr_pages=1
: [...]
: [   73.534865] Task in /ile0 killed as a result of limit of /ile0
: [   73.540865] memory: usage 76kB, limit 0kB, failcnt 260
: [   73.546142] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
: [   73.552898] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
: [   73.559051] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB 
shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
: [   73.578533] Tasks state (memory values in pages):
: [   73.583404] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
swapents oom_score_adj name
: [   73.592277] [   6569] 0  6562 94271532480  
   0 syz-executor0
: [   73.601299] [   6576] 0  6576 94260614400  
   0 syz-executor6
: [   73.610333] [   6578] 0  6578 9426  534614400  
   0 syz-executor4
: [   73.619381] [   6579] 0  6579 94260573440  
   0 syz-executor5
: [   73.628414] [   6582] 0  6582 94260614400  
   0 syz-executor7
: [   73.637441] [   6584] 0  6584 94260573440  
   0 syz-executor1
: [   73.646464] Memory cgroup out of memory: Kill process 6578 (syz-executor4) 
score 549000 or sacrifice child
: [   73.656295] task=syz-executor6 pid=6576 is oom victim now

This should be 6578 but we at least know that we are running in 6576
context so the we are setting the state from a remote context which
itself has been killed already

: [   73.661841] Killed process 6578 (syz-executor4) total-vm:37704kB, 
anon-rss:2136kB, file-rss:0kB, shmem-rss:0kB
: [   73.672035] task=syz-executor6 pid=6576 charge bypass
: [   73.672801] oom_reaper: reaped process 6578 (syz-executor4), now 
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
: [   73.678829] task=syz-executor4 pid=6578 invoked memcg oom killer. 
oom_victim=1

and here the victim finally reached the oom path finally.

: [   73.687453] task=syz-executor6 pid=6576 charge for nr_pages=1
: [   73.694534] [ cut here ]
: [   73.700424] task=syz-executor6 pid=6576 charge bypass
: [   73.705175] Memory cgroup charge failed because of no reclaimable memory! 
This looks like a misconfiguration or a kernel bug.
: [   73.705321] WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 
try_charge+0xafa/0x1710

But there is nobody killable. So the oom kill happened _after_ our force
charge path. Therefore we should do the following regardless whether we
make tis warn or pr_$foo

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
116b181bb646afedd770985de20a68721bdb2648

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..1b6eed1bc404 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1703,7 +1703,8 @@ static enum oom_status mem_cgroup_oom(struct mem_cgroup 
*memcg, gfp_t mask, int
return OOM_ASYNC;
}
 
-   if (mem_cgroup_out_of_memory(memcg, mask, order))
+   if (mem_cgroup_out_of_memory(memcg, mask, order) ||
+   tsk_is_oom_victim(current))
return OOM_SUCCESS;
 
WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 20:13:39, Michal Hocko wrote:
> I simply do not see how this is possible. Let's try with the following
> extended debugging patch.
> 
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> 116b181bb646afedd770985de20a68721bdb2648
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..e2dfdf4361ba 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>   bool ret;
>  
>   mutex_lock(_lock);
> + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
> + current->comm, current->pid, 
> tsk_is_oom_victim(current));
>   ret = out_of_memory();
>   mutex_unlock(_lock);
>   return ret;
> @@ -2108,6 +2110,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
> gfp_mask,
>  
>   if (mem_cgroup_is_root(memcg))
>   return 0;
> + if (tsk_is_oom_victim(current))
> + pr_info("task=%s pid=%d charge for nr_pages=%d\n",
> + current->comm, current->pid, nr_pages);
>  retry:
>   if (consume_stock(memcg, nr_pages))
>   return 0;
> @@ -2137,8 +2142,11 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
> gfp_mask,
>*/
>   if (unlikely(tsk_is_oom_victim(current) ||
>fatal_signal_pending(current) ||
> -  current->flags & PF_EXITING))
> +  current->flags & PF_EXITING)) {
> + pr_info("task=%s pid=%d charge bypass\n",
> + current->comm, current->pid);
>   goto force;
> + }
>  
>   /*
>* Prevent unbounded recursion when reclaim operations need to
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 104ef4a01a55..7d9adcde8cf6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -692,6 +692,8 @@ static void mark_oom_victim(struct task_struct *tsk)
>   __thaw_task(tsk);
>   atomic_inc(_victims);
>   trace_mark_victim(tsk->pid);
> + pr_info("task=%s pid=%d is oom victim now\n",
> + current->comm, current->pid);
>  }

Ble this should be s@current@tsk@g. But it didn't really help. I will
have to think about this more.

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 20:13:39, Michal Hocko wrote:
> I simply do not see how this is possible. Let's try with the following
> extended debugging patch.
> 
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> 116b181bb646afedd770985de20a68721bdb2648
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..e2dfdf4361ba 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>   bool ret;
>  
>   mutex_lock(_lock);
> + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
> + current->comm, current->pid, 
> tsk_is_oom_victim(current));
>   ret = out_of_memory();
>   mutex_unlock(_lock);
>   return ret;
> @@ -2108,6 +2110,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
> gfp_mask,
>  
>   if (mem_cgroup_is_root(memcg))
>   return 0;
> + if (tsk_is_oom_victim(current))
> + pr_info("task=%s pid=%d charge for nr_pages=%d\n",
> + current->comm, current->pid, nr_pages);
>  retry:
>   if (consume_stock(memcg, nr_pages))
>   return 0;
> @@ -2137,8 +2142,11 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
> gfp_mask,
>*/
>   if (unlikely(tsk_is_oom_victim(current) ||
>fatal_signal_pending(current) ||
> -  current->flags & PF_EXITING))
> +  current->flags & PF_EXITING)) {
> + pr_info("task=%s pid=%d charge bypass\n",
> + current->comm, current->pid);
>   goto force;
> + }
>  
>   /*
>* Prevent unbounded recursion when reclaim operations need to
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 104ef4a01a55..7d9adcde8cf6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -692,6 +692,8 @@ static void mark_oom_victim(struct task_struct *tsk)
>   __thaw_task(tsk);
>   atomic_inc(_victims);
>   trace_mark_victim(tsk->pid);
> + pr_info("task=%s pid=%d is oom victim now\n",
> + current->comm, current->pid);
>  }

Ble this should be s@current@tsk@g. But it didn't really help. I will
have to think about this more.

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:

WARNING in try_charge

task=syz-executor4 pid=6578 invoked memcg oom killer. oom_victim=1
task=syz-executor6 pid=6576 charge for nr_pages=1
[ cut here ]
task=syz-executor6 pid=6576 charge bypass
Memory cgroup charge failed because of no reclaimable memory! This looks  
like a misconfiguration or a kernel bug.
WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 mem_cgroup_oom  
mm/memcontrol.c:1706 [inline]
WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710  
mm/memcontrol.c:2270

task=syz-executor6 pid=6576 charge for nr_pages=1
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6578 Comm: syz-executor4 Not tainted 4.18.0-rc7-next-20180803+  
#1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
task=syz-executor6 pid=6576 charge bypass
 panic+0x238/0x4e7 kernel/panic.c:184
task=syz-executor6 pid=6576 charge for nr_pages=1
 __warn.cold.8+0x163/0x1ba kernel/panic.c:536
task=syz-executor6 pid=6576 charge bypass
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
task=syz-executor6 pid=6576 charge for nr_pages=1
task=syz-executor6 pid=6576 charge bypass
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:996
RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline]
RIP: 0010:try_charge+0xafa/0x1710 mm/memcontrol.c:2270
task=syz-executor6 pid=6576 charge for nr_pages=1
Code: 85 4a 01 00 00 8b b5 bc fd ff ff 44 89 f2 4c 89 ff e8 3a 4d ff ff 84  
c0 0f 85 9b 04 00 00 48 c7 c7 a0 18 13 87 e8 86 f9 85 ff <0f> 0b 48 8b 95  
d0 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00

RSP: 0018:8801af5bf458 EFLAGS: 00010282
RAX:  RBX: dc00 RCX: 
RDX:  RSI: 816366f1 RDI: 8801af5bf148
task=syz-executor6 pid=6576 charge bypass
RBP: 8801af5bf6f0 R08: 8801aeae2540 R09: fbfff0ff11fc
R10: fbfff0ff11fc R11: 87f88fe3 R12: 8801b6bd4800
R13: 8801af5bf6c8 R14:  R15: 8801b6bd4800
task=syz-executor5 pid=6579 invoked memcg oom killer. oom_victim=1
[ cut here ]
Memory cgroup charge failed because of no reclaimable memory! This looks  
like a misconfiguration or a kernel bug.

 memcg_kmem_charge_memcg+0x7c/0x120 mm/memcontrol.c:2600
WARNING: CPU: 0 PID: 6579 at mm/memcontrol.c:1707 mem_cgroup_oom  
mm/memcontrol.c:1706 [inline]
WARNING: CPU: 0 PID: 6579 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710  
mm/memcontrol.c:2270

 memcg_charge_slab mm/slab.h:283 [inline]
 kmem_getpages mm/slab.c:1415 [inline]
 cache_grow_begin+0x207/0x710 mm/slab.c:2677
Modules linked in:
 fallback_alloc+0x203/0x2c0 mm/slab.c:3219
CPU: 0 PID: 6579 Comm: syz-executor5 Not tainted 4.18.0-rc7-next-20180803+  
#1

 cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

 __do_cache_alloc mm/slab.c:3356 [inline]
 slab_alloc mm/slab.c:3384 [inline]
 kmem_cache_alloc+0x1e5/0x760 mm/slab.c:3552
RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline]
RIP: 0010:try_charge+0xafa/0x1710 mm/memcontrol.c:2270
 anon_vma_chain_alloc mm/rmap.c:129 [inline]
 __anon_vma_prepare+0xc4/0x720 mm/rmap.c:183
Code: 85 4a 01 00 00 8b b5 bc fd ff ff 44 89 f2 4c 89 ff e8 3a 4d ff ff 84  
c0 0f 85 9b 04 00 00 48 c7 c7 a0 18 13 87 e8 86 f9 85 ff <0f> 0b 48 8b 95  
d0 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00

RSP: 0018:8801c82f7458 EFLAGS: 00010282
RAX:  RBX: dc00 RCX: 
RDX:  RSI: 816366f1 RDI: 8801c82f7148
RBP: 8801c82f76f0 R08: 8801ae6905c0 R09: fbfff0ff11fc
R10: fbfff0ff11fc R11: 87f88fe3 R12: 8801b6bd4800
 anon_vma_prepare include/linux/rmap.h:153 [inline]
 do_anonymous_page mm/memory.c:3160 [inline]
 handle_pte_fault mm/memory.c:3971 [inline]
 __handle_mm_fault+0x3556/0x4470 mm/memory.c:4097
R13: 8801c82f76c8 R14:  R15: 8801b6bd4800
FS:  00b19940() GS:8801db00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01dc8978 CR3: 0001c83bc000 CR4: 001406f0
Call Trace:
 memcg_kmem_charge_memcg+0x7c/0x120 mm/memcontrol.c:2600
 memcg_charge_slab mm/slab.h:283 [inline]
 kmem_getpages mm/slab.c:1415 [inline]
 cache_grow_begin+0x207/0x710 mm/slab.c:2677
 handle_mm_fault+0x53e/0xc80 mm/memory.c:4134
 fallback_alloc+0x203/0x2c0 mm/slab.c:3219
 cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287
 __do_page_fault+0x620/0xe50 arch/x86/mm/fault.c:1395
 __do_cache_alloc mm/slab.c:3356 [inline]
 slab_alloc mm/slab.c:3384 [inline]
 kmem_cache_alloc

Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:

WARNING in try_charge

task=syz-executor4 pid=6578 invoked memcg oom killer. oom_victim=1
task=syz-executor6 pid=6576 charge for nr_pages=1
[ cut here ]
task=syz-executor6 pid=6576 charge bypass
Memory cgroup charge failed because of no reclaimable memory! This looks  
like a misconfiguration or a kernel bug.
WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 mem_cgroup_oom  
mm/memcontrol.c:1706 [inline]
WARNING: CPU: 1 PID: 6578 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710  
mm/memcontrol.c:2270

task=syz-executor6 pid=6576 charge for nr_pages=1
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6578 Comm: syz-executor4 Not tainted 4.18.0-rc7-next-20180803+  
#1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
task=syz-executor6 pid=6576 charge bypass
 panic+0x238/0x4e7 kernel/panic.c:184
task=syz-executor6 pid=6576 charge for nr_pages=1
 __warn.cold.8+0x163/0x1ba kernel/panic.c:536
task=syz-executor6 pid=6576 charge bypass
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
task=syz-executor6 pid=6576 charge for nr_pages=1
task=syz-executor6 pid=6576 charge bypass
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:996
RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline]
RIP: 0010:try_charge+0xafa/0x1710 mm/memcontrol.c:2270
task=syz-executor6 pid=6576 charge for nr_pages=1
Code: 85 4a 01 00 00 8b b5 bc fd ff ff 44 89 f2 4c 89 ff e8 3a 4d ff ff 84  
c0 0f 85 9b 04 00 00 48 c7 c7 a0 18 13 87 e8 86 f9 85 ff <0f> 0b 48 8b 95  
d0 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00

RSP: 0018:8801af5bf458 EFLAGS: 00010282
RAX:  RBX: dc00 RCX: 
RDX:  RSI: 816366f1 RDI: 8801af5bf148
task=syz-executor6 pid=6576 charge bypass
RBP: 8801af5bf6f0 R08: 8801aeae2540 R09: fbfff0ff11fc
R10: fbfff0ff11fc R11: 87f88fe3 R12: 8801b6bd4800
R13: 8801af5bf6c8 R14:  R15: 8801b6bd4800
task=syz-executor5 pid=6579 invoked memcg oom killer. oom_victim=1
[ cut here ]
Memory cgroup charge failed because of no reclaimable memory! This looks  
like a misconfiguration or a kernel bug.

 memcg_kmem_charge_memcg+0x7c/0x120 mm/memcontrol.c:2600
WARNING: CPU: 0 PID: 6579 at mm/memcontrol.c:1707 mem_cgroup_oom  
mm/memcontrol.c:1706 [inline]
WARNING: CPU: 0 PID: 6579 at mm/memcontrol.c:1707 try_charge+0xafa/0x1710  
mm/memcontrol.c:2270

 memcg_charge_slab mm/slab.h:283 [inline]
 kmem_getpages mm/slab.c:1415 [inline]
 cache_grow_begin+0x207/0x710 mm/slab.c:2677
Modules linked in:
 fallback_alloc+0x203/0x2c0 mm/slab.c:3219
CPU: 0 PID: 6579 Comm: syz-executor5 Not tainted 4.18.0-rc7-next-20180803+  
#1

 cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

 __do_cache_alloc mm/slab.c:3356 [inline]
 slab_alloc mm/slab.c:3384 [inline]
 kmem_cache_alloc+0x1e5/0x760 mm/slab.c:3552
RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline]
RIP: 0010:try_charge+0xafa/0x1710 mm/memcontrol.c:2270
 anon_vma_chain_alloc mm/rmap.c:129 [inline]
 __anon_vma_prepare+0xc4/0x720 mm/rmap.c:183
Code: 85 4a 01 00 00 8b b5 bc fd ff ff 44 89 f2 4c 89 ff e8 3a 4d ff ff 84  
c0 0f 85 9b 04 00 00 48 c7 c7 a0 18 13 87 e8 86 f9 85 ff <0f> 0b 48 8b 95  
d0 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00

RSP: 0018:8801c82f7458 EFLAGS: 00010282
RAX:  RBX: dc00 RCX: 
RDX:  RSI: 816366f1 RDI: 8801c82f7148
RBP: 8801c82f76f0 R08: 8801ae6905c0 R09: fbfff0ff11fc
R10: fbfff0ff11fc R11: 87f88fe3 R12: 8801b6bd4800
 anon_vma_prepare include/linux/rmap.h:153 [inline]
 do_anonymous_page mm/memory.c:3160 [inline]
 handle_pte_fault mm/memory.c:3971 [inline]
 __handle_mm_fault+0x3556/0x4470 mm/memory.c:4097
R13: 8801c82f76c8 R14:  R15: 8801b6bd4800
FS:  00b19940() GS:8801db00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 01dc8978 CR3: 0001c83bc000 CR4: 001406f0
Call Trace:
 memcg_kmem_charge_memcg+0x7c/0x120 mm/memcontrol.c:2600
 memcg_charge_slab mm/slab.h:283 [inline]
 kmem_getpages mm/slab.c:1415 [inline]
 cache_grow_begin+0x207/0x710 mm/slab.c:2677
 handle_mm_fault+0x53e/0xc80 mm/memory.c:4134
 fallback_alloc+0x203/0x2c0 mm/slab.c:3219
 cache_alloc_node+0x1c7/0x1e0 mm/slab.c:3287
 __do_page_fault+0x620/0xe50 arch/x86/mm/fault.c:1395
 __do_cache_alloc mm/slab.c:3356 [inline]
 slab_alloc mm/slab.c:3384 [inline]
 kmem_cache_alloc

Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
I simply do not see how this is possible. Let's try with the following
extended debugging patch.

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
116b181bb646afedd770985de20a68721bdb2648

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..e2dfdf4361ba 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
bool ret;
 
mutex_lock(_lock);
+   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
+   current->comm, current->pid, 
tsk_is_oom_victim(current));
ret = out_of_memory();
mutex_unlock(_lock);
return ret;
@@ -2108,6 +2110,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
gfp_mask,
 
if (mem_cgroup_is_root(memcg))
return 0;
+   if (tsk_is_oom_victim(current))
+   pr_info("task=%s pid=%d charge for nr_pages=%d\n",
+   current->comm, current->pid, nr_pages);
 retry:
if (consume_stock(memcg, nr_pages))
return 0;
@@ -2137,8 +2142,11 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
gfp_mask,
 */
if (unlikely(tsk_is_oom_victim(current) ||
 fatal_signal_pending(current) ||
-current->flags & PF_EXITING))
+current->flags & PF_EXITING)) {
+   pr_info("task=%s pid=%d charge bypass\n",
+   current->comm, current->pid);
goto force;
+   }
 
/*
 * Prevent unbounded recursion when reclaim operations need to
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 104ef4a01a55..7d9adcde8cf6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -692,6 +692,8 @@ static void mark_oom_victim(struct task_struct *tsk)
__thaw_task(tsk);
atomic_inc(_victims);
trace_mark_victim(tsk->pid);
+   pr_info("task=%s pid=%d is oom victim now\n",
+   current->comm, current->pid);
 }
 
 /**
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
I simply do not see how this is possible. Let's try with the following
extended debugging patch.

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
116b181bb646afedd770985de20a68721bdb2648

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..e2dfdf4361ba 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
bool ret;
 
mutex_lock(_lock);
+   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
+   current->comm, current->pid, 
tsk_is_oom_victim(current));
ret = out_of_memory();
mutex_unlock(_lock);
return ret;
@@ -2108,6 +2110,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
gfp_mask,
 
if (mem_cgroup_is_root(memcg))
return 0;
+   if (tsk_is_oom_victim(current))
+   pr_info("task=%s pid=%d charge for nr_pages=%d\n",
+   current->comm, current->pid, nr_pages);
 retry:
if (consume_stock(memcg, nr_pages))
return 0;
@@ -2137,8 +2142,11 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t 
gfp_mask,
 */
if (unlikely(tsk_is_oom_victim(current) ||
 fatal_signal_pending(current) ||
-current->flags & PF_EXITING))
+current->flags & PF_EXITING)) {
+   pr_info("task=%s pid=%d charge bypass\n",
+   current->comm, current->pid);
goto force;
+   }
 
/*
 * Prevent unbounded recursion when reclaim operations need to
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 104ef4a01a55..7d9adcde8cf6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -692,6 +692,8 @@ static void mark_oom_victim(struct task_struct *tsk)
__thaw_task(tsk);
atomic_inc(_victims);
trace_mark_victim(tsk->pid);
+   pr_info("task=%s pid=%d is oom victim now\n",
+   current->comm, current->pid);
 }
 
 /**
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 19:44:10, Michal Hocko wrote:
> On Mon 06-08-18 08:42:02, syzbot wrote:
> > Hello,
> > 
> > syzbot has tested the proposed patch but the reproducer still triggered
> > crash:
> > WARNING in try_charge
> > 
> > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB,
> > file-rss:0kB, shmem-rss:0kB
> > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB,
> > file-rss:0kB, shmem-rss:0kB
> > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1
> 
> Thank you. This is useful. The full oom picture is this
> : [   65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. 
> oom_victim=0
> [...]
> : [   65.920355] Task in /ile0 killed as a result of limit of /ile0
> : [   65.926389] memory: usage 0kB, limit 0kB, failcnt 20
> : [   65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
> : [   65.963878] Tasks state (memory values in pages):
> : [   65.968743] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [   65.977615] [   6410] 0  6410 9427  532614400
>  0 syz-executor5
> : [   65.986647] Memory cgroup out of memory: Kill process 6410 
> (syz-executor5) score 547000 or sacrifice child
> : [   65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, 
> anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB
> : [   66.007471] oom_reaper: reaped process 6410 (syz-executor5), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> : [   66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. 
> oom_victim=1
> : [   66.025137] [ cut here ]
> : [   66.029927] Memory cgroup charge failed because of no reclaimable 
> memory! This looks like a misconfiguration or a kernel bug.
> : [   66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 
> try_charge+0x734/0x1680
> 
> So we have only a single task in the memcg and it is this task which
> triggers the OOM. It gets killed and oom_reaped. This means that
> out_of_memory should return with true and so we should retry and force
> the charge as I've already mentioned. For some reason this task has
> triggered the oom killer path again and then we haven't found any
> eligible task and resulted in the warning. This shouldn't happen.
> 
> I will stare to the code some more to see how the heck we get there
> without passing 
>   if (unlikely(tsk_is_oom_victim(current) ||
>fatal_signal_pending(current) ||
>current->flags & PF_EXITING))
>   goto force;

Hmm, so while the OOM killer was invoked from
[   65.405905] Call Trace:
[   65.408498]  dump_stack+0x1c9/0x2b4
[   65.421606]  dump_header+0x27b/0xf70
[   65.545094]  oom_kill_process.cold.28+0x10/0x95a
[   65.605696]  out_of_memory+0xa8a/0x14d0
[   65.627227]  mem_cgroup_out_of_memory+0x213/0x300
[   65.641293]  try_charge+0x720/0x1680
[   65.674806]  memcg_kmem_charge_memcg+0x7c/0x120
[   65.687939]  cache_grow_begin+0x207/0x710
[   65.696553]  fallback_alloc+0x203/0x2c0
[   65.700519]  cache_alloc_node+0x1c7/0x1e0
[   65.704999]  kmem_cache_alloc+0x1e5/0x760
[   65.717947]  shmem_alloc_inode+0x1b/0x40
[   65.722003]  alloc_inode+0x63/0x190
[   65.725642]  new_inode_pseudo+0x71/0x1a0
[   65.738077]  new_inode+0x1c/0x40
[   65.741432]  shmem_get_inode+0xf1/0x910
[   65.771550]  __shmem_file_setup.part.48+0x83/0x2a0
[   65.776482]  shmem_file_setup+0x65/0x90
[   65.780444]  __x64_sys_memfd_create+0x2af/0x4f0

The warning happened from a different path
[   66.151455] RIP: 0010:try_charge+0x734/0x1680
[...]
[   66.270886]  mem_cgroup_try_charge+0x4ff/0xa70
[   66.305602]  mem_cgroup_try_charge_delay+0x1d/0x90
[   66.310514]  __handle_mm_fault+0x25be/0x4470
[   66.366608]  handle_mm_fault+0x53e/0xc80
[   66.402384]  do_page_fault+0xf6/0x8c0
[   66.451629]  page_fault+0x1e/0x30

So the oom victim indeed passed the above force path after the oom
invocation. But later on hit the page fault path and that behaved
differently and for some reason the force path hasn't triggered. I am
wondering how could we hit the page fault path in the first place. The
task is already killed! So what the hell is going on here.

I must be missing something obvious here.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 19:44:10, Michal Hocko wrote:
> On Mon 06-08-18 08:42:02, syzbot wrote:
> > Hello,
> > 
> > syzbot has tested the proposed patch but the reproducer still triggered
> > crash:
> > WARNING in try_charge
> > 
> > Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB,
> > file-rss:0kB, shmem-rss:0kB
> > oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB,
> > file-rss:0kB, shmem-rss:0kB
> > task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1
> 
> Thank you. This is useful. The full oom picture is this
> : [   65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. 
> oom_victim=0
> [...]
> : [   65.920355] Task in /ile0 killed as a result of limit of /ile0
> : [   65.926389] memory: usage 0kB, limit 0kB, failcnt 20
> : [   65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
> : [   65.963878] Tasks state (memory values in pages):
> : [   65.968743] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [   65.977615] [   6410] 0  6410 9427  532614400
>  0 syz-executor5
> : [   65.986647] Memory cgroup out of memory: Kill process 6410 
> (syz-executor5) score 547000 or sacrifice child
> : [   65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, 
> anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB
> : [   66.007471] oom_reaper: reaped process 6410 (syz-executor5), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> : [   66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. 
> oom_victim=1
> : [   66.025137] [ cut here ]
> : [   66.029927] Memory cgroup charge failed because of no reclaimable 
> memory! This looks like a misconfiguration or a kernel bug.
> : [   66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 
> try_charge+0x734/0x1680
> 
> So we have only a single task in the memcg and it is this task which
> triggers the OOM. It gets killed and oom_reaped. This means that
> out_of_memory should return with true and so we should retry and force
> the charge as I've already mentioned. For some reason this task has
> triggered the oom killer path again and then we haven't found any
> eligible task and resulted in the warning. This shouldn't happen.
> 
> I will stare to the code some more to see how the heck we get there
> without passing 
>   if (unlikely(tsk_is_oom_victim(current) ||
>fatal_signal_pending(current) ||
>current->flags & PF_EXITING))
>   goto force;

Hmm, so while the OOM killer was invoked from
[   65.405905] Call Trace:
[   65.408498]  dump_stack+0x1c9/0x2b4
[   65.421606]  dump_header+0x27b/0xf70
[   65.545094]  oom_kill_process.cold.28+0x10/0x95a
[   65.605696]  out_of_memory+0xa8a/0x14d0
[   65.627227]  mem_cgroup_out_of_memory+0x213/0x300
[   65.641293]  try_charge+0x720/0x1680
[   65.674806]  memcg_kmem_charge_memcg+0x7c/0x120
[   65.687939]  cache_grow_begin+0x207/0x710
[   65.696553]  fallback_alloc+0x203/0x2c0
[   65.700519]  cache_alloc_node+0x1c7/0x1e0
[   65.704999]  kmem_cache_alloc+0x1e5/0x760
[   65.717947]  shmem_alloc_inode+0x1b/0x40
[   65.722003]  alloc_inode+0x63/0x190
[   65.725642]  new_inode_pseudo+0x71/0x1a0
[   65.738077]  new_inode+0x1c/0x40
[   65.741432]  shmem_get_inode+0xf1/0x910
[   65.771550]  __shmem_file_setup.part.48+0x83/0x2a0
[   65.776482]  shmem_file_setup+0x65/0x90
[   65.780444]  __x64_sys_memfd_create+0x2af/0x4f0

The warning happened from a different path
[   66.151455] RIP: 0010:try_charge+0x734/0x1680
[...]
[   66.270886]  mem_cgroup_try_charge+0x4ff/0xa70
[   66.305602]  mem_cgroup_try_charge_delay+0x1d/0x90
[   66.310514]  __handle_mm_fault+0x25be/0x4470
[   66.366608]  handle_mm_fault+0x53e/0xc80
[   66.402384]  do_page_fault+0xf6/0x8c0
[   66.451629]  page_fault+0x1e/0x30

So the oom victim indeed passed the above force path after the oom
invocation. But later on hit the page fault path and that behaved
differently and for some reason the force path hasn't triggered. I am
wondering how could we hit the page fault path in the first place. The
task is already killed! So what the hell is going on here.

I must be missing something obvious here.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 7:30 PM, Michal Hocko  wrote:
>> >> >> A much
>> >> >> friendlier for user way to say this would be print a message at the
>> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
>> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
>> >> >> trace (does not give any useful info for user). And return EINVAL if
>> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the
>> >> >> WARNING each occurrence of which we do want to be reported to kernel
>> >> >> developers.
>> >> >
>> >> > But this is not applicable here. Your misconfiguration is quite obvious
>> >> > because you simply set the hard limit to 0. This is not the only
>> >> > situation when this can happen. There is no clear point to tell, you are
>> >> > doing this wrong. If it was we would do it at that point obviously.
>> >>
>> >> But, isn't there a point were hard limit is set to 0? I would expect
>> >> there is a something like cgroup file write handler with a value of 0
>> >> or something.
>> >
>> > Yeah, but this is only one instance of the problem. Other is that the
>> > memcg is not reclaimable for any other reasons. And we do not know what
>> > those might be
>> >
>> >>
>> >> > If you have a strong reason to believe that this is an abuse of WARN I
>> >> > am all happy to change that. But I haven't heard any yet, to be honest.
>> >>
>> >> WARN must not be used for anything that is not kernel bugs. If this is
>> >> not kernel bug, WARN must not be used here.
>> >
>> > This is rather strong wording without any backing arguments. I strongly
>> > doubt 90% of existing WARN* match this expectation. WARN* has
>> > traditionally been a way to tell that something suspicious is going on.
>> > Those situation are mostly likely not fatal but it is good to know they
>> > are happening.
>> >
>> > Sure there is that panic_on_warn thingy which you seem to be using and I
>> > suspect it is a reason why you are so careful about warnings in general
>> > but my experience tells me that this configuration is barely usable
>> > except for testing (which is your case).
>> >
>> > But as I've said, I do not insist on WARN here. All I care about is to
>> > warn user that something might go south and this may be either due to
>> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.
>>
>> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is
>> also a kernel bug on top of that and it's actually a kernel bug that
>> provokes the warning?
>
> As I've tried to tell already. I cannot tell for sure. It is the killed
> oom victim which triggered thw warning and that shouldn't really
> happen. Considering this doesn't reproduce with the current linux next
> nor linus tree and the oom code has changed since the version you have
> tested then I would suspect there was something wrong with the memcg oom
> code. But maybe the test doesn't really reproduce reliably.
>
>> If it's a kernel bug, then I propose to stop arguing about
>> configuration and concentrate on the bug.
>> If it's just the misconfiguration that triggers the warning,  then can
>> we separate the 2 causes of the warning (user misconfiguration and
>> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print
>> a line to console if necessary)? Or if the limit=0 is somehow not
>> possible/desirable to detect right away, check limit=0 at the point of
>> the warning and don't want?
>
> No we simply cannot. There is numerous situations when this can trigger.
> Say you set the hard limit to N and then try to fault in shmem file with
> the size >= N. No oom killer will help to reclaim memory. Or say you
> migrate the all tasks away from the memcg and then somebody triggers the
> memcg OOM in that group. There is simply nobody to kill. See the point?
> There is simply no direct contection between the configuration and
> actual problem. Too many things might happen between those two points.
> Let me repeat. We do warn because we want to hear if this happens. WARN
> tends to be a good way to get that attention. If you strongly believe
> this is an abuse I won't mind seeing a patch to turn it into something
> different.

I don't believe it is an abuse, I don't know this code well. Let's
assume the misconfiguration is a red-herring for now then.


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 7:30 PM, Michal Hocko  wrote:
>> >> >> A much
>> >> >> friendlier for user way to say this would be print a message at the
>> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
>> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
>> >> >> trace (does not give any useful info for user). And return EINVAL if
>> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the
>> >> >> WARNING each occurrence of which we do want to be reported to kernel
>> >> >> developers.
>> >> >
>> >> > But this is not applicable here. Your misconfiguration is quite obvious
>> >> > because you simply set the hard limit to 0. This is not the only
>> >> > situation when this can happen. There is no clear point to tell, you are
>> >> > doing this wrong. If it was we would do it at that point obviously.
>> >>
>> >> But, isn't there a point were hard limit is set to 0? I would expect
>> >> there is a something like cgroup file write handler with a value of 0
>> >> or something.
>> >
>> > Yeah, but this is only one instance of the problem. Other is that the
>> > memcg is not reclaimable for any other reasons. And we do not know what
>> > those might be
>> >
>> >>
>> >> > If you have a strong reason to believe that this is an abuse of WARN I
>> >> > am all happy to change that. But I haven't heard any yet, to be honest.
>> >>
>> >> WARN must not be used for anything that is not kernel bugs. If this is
>> >> not kernel bug, WARN must not be used here.
>> >
>> > This is rather strong wording without any backing arguments. I strongly
>> > doubt 90% of existing WARN* match this expectation. WARN* has
>> > traditionally been a way to tell that something suspicious is going on.
>> > Those situation are mostly likely not fatal but it is good to know they
>> > are happening.
>> >
>> > Sure there is that panic_on_warn thingy which you seem to be using and I
>> > suspect it is a reason why you are so careful about warnings in general
>> > but my experience tells me that this configuration is barely usable
>> > except for testing (which is your case).
>> >
>> > But as I've said, I do not insist on WARN here. All I care about is to
>> > warn user that something might go south and this may be either due to
>> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.
>>
>> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is
>> also a kernel bug on top of that and it's actually a kernel bug that
>> provokes the warning?
>
> As I've tried to tell already. I cannot tell for sure. It is the killed
> oom victim which triggered thw warning and that shouldn't really
> happen. Considering this doesn't reproduce with the current linux next
> nor linus tree and the oom code has changed since the version you have
> tested then I would suspect there was something wrong with the memcg oom
> code. But maybe the test doesn't really reproduce reliably.
>
>> If it's a kernel bug, then I propose to stop arguing about
>> configuration and concentrate on the bug.
>> If it's just the misconfiguration that triggers the warning,  then can
>> we separate the 2 causes of the warning (user misconfiguration and
>> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print
>> a line to console if necessary)? Or if the limit=0 is somehow not
>> possible/desirable to detect right away, check limit=0 at the point of
>> the warning and don't want?
>
> No we simply cannot. There is numerous situations when this can trigger.
> Say you set the hard limit to N and then try to fault in shmem file with
> the size >= N. No oom killer will help to reclaim memory. Or say you
> migrate the all tasks away from the memcg and then somebody triggers the
> memcg OOM in that group. There is simply nobody to kill. See the point?
> There is simply no direct contection between the configuration and
> actual problem. Too many things might happen between those two points.
> Let me repeat. We do warn because we want to hear if this happens. WARN
> tends to be a good way to get that attention. If you strongly believe
> this is an abuse I won't mind seeing a patch to turn it into something
> different.

I don't believe it is an abuse, I don't know this code well. Let's
assume the misconfiguration is a red-herring for now then.


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 7:44 PM, Michal Hocko  wrote:
> On Mon 06-08-18 08:42:02, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch but the reproducer still triggered
>> crash:
>> WARNING in try_charge
>>
>> Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB,
>> file-rss:0kB, shmem-rss:0kB
>> oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB,
>> file-rss:0kB, shmem-rss:0kB
>> task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1
>
> Thank you. This is useful. The full oom picture is this
> : [   65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. 
> oom_victim=0
> [...]
> : [   65.920355] Task in /ile0 killed as a result of limit of /ile0
> : [   65.926389] memory: usage 0kB, limit 0kB, failcnt 20
> : [   65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
> : [   65.963878] Tasks state (memory values in pages):
> : [   65.968743] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [   65.977615] [   6410] 0  6410 9427  532614400
>  0 syz-executor5
> : [   65.986647] Memory cgroup out of memory: Kill process 6410 
> (syz-executor5) score 547000 or sacrifice child
> : [   65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, 
> anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB
> : [   66.007471] oom_reaper: reaped process 6410 (syz-executor5), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> : [   66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. 
> oom_victim=1
> : [   66.025137] [ cut here ]
> : [   66.029927] Memory cgroup charge failed because of no reclaimable 
> memory! This looks like a misconfiguration or a kernel bug.
> : [   66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 
> try_charge+0x734/0x1680
>
> So we have only a single task in the memcg and it is this task which
> triggers the OOM. It gets killed and oom_reaped. This means that
> out_of_memory should return with true and so we should retry and force
> the charge as I've already mentioned. For some reason this task has
> triggered the oom killer path again and then we haven't found any
> eligible task and resulted in the warning. This shouldn't happen.
>
> I will stare to the code some more to see how the heck we get there
> without passing
> if (unlikely(tsk_is_oom_victim(current) ||
>  fatal_signal_pending(current) ||
>  current->flags & PF_EXITING))
> goto force;


This is the syzkaller reproducer if it will give you any hints:

mkdir(&(0x7f000340)='./file0\x00', 0x0)
mount(&(0x7fc0)='./file0//ile0\x00',
&(0x7f80)='./file0\x00', &(0x7f000200)='cgroup2\x00', 0x0,
0x0)
r0 = open(&(0x7f40)='./file0//ile0\x00', 0x0, 0x0)
r1 = openat$cgroup_procs(r0, &(0x7f0001c0)='cgroup.procs\x00', 0x2, 0x0)
write$cgroup_pid(r1, &(0x7f000100), 0x12)
syz_mount_image$xfs(&(0x7f000280)='xfs\x00',
&(0x7f0002c0)='./file0//ile0\x00', 0x0, 0x0, &(0x7f000740),
0x0, &(0x7f000800))
getsockopt$sock_cred(r0, 0x1, 0x11, &(0x7f000180), &(0x7f000240)=0xc)
r2 = open(&(0x7f40)='./file0//ile0\x00', 0x0, 0x0)
write$FUSE_STATFS(r2, &(0x7f000280)={0x60, 0xfffe,
0x7, {{0x401, 0x7, 0x7d, 0x1c, 0xd8, 0xb7, 0x2, 0x3}}}, 0x60)
r3 = openat$cgroup_int(r2, &(0x7f0001c0)='memory.max\x00', 0x2, 0x0)
write$cgroup_int(r3, &(0x7fc0), 0x12)


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 7:44 PM, Michal Hocko  wrote:
> On Mon 06-08-18 08:42:02, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch but the reproducer still triggered
>> crash:
>> WARNING in try_charge
>>
>> Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB,
>> file-rss:0kB, shmem-rss:0kB
>> oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB,
>> file-rss:0kB, shmem-rss:0kB
>> task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1
>
> Thank you. This is useful. The full oom picture is this
> : [   65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. 
> oom_victim=0
> [...]
> : [   65.920355] Task in /ile0 killed as a result of limit of /ile0
> : [   65.926389] memory: usage 0kB, limit 0kB, failcnt 20
> : [   65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [   65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
> : [   65.963878] Tasks state (memory values in pages):
> : [   65.968743] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [   65.977615] [   6410] 0  6410 9427  532614400
>  0 syz-executor5
> : [   65.986647] Memory cgroup out of memory: Kill process 6410 
> (syz-executor5) score 547000 or sacrifice child
> : [   65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, 
> anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB
> : [   66.007471] oom_reaper: reaped process 6410 (syz-executor5), now 
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
> : [   66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. 
> oom_victim=1
> : [   66.025137] [ cut here ]
> : [   66.029927] Memory cgroup charge failed because of no reclaimable 
> memory! This looks like a misconfiguration or a kernel bug.
> : [   66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 
> try_charge+0x734/0x1680
>
> So we have only a single task in the memcg and it is this task which
> triggers the OOM. It gets killed and oom_reaped. This means that
> out_of_memory should return with true and so we should retry and force
> the charge as I've already mentioned. For some reason this task has
> triggered the oom killer path again and then we haven't found any
> eligible task and resulted in the warning. This shouldn't happen.
>
> I will stare to the code some more to see how the heck we get there
> without passing
> if (unlikely(tsk_is_oom_victim(current) ||
>  fatal_signal_pending(current) ||
>  current->flags & PF_EXITING))
> goto force;


This is the syzkaller reproducer if it will give you any hints:

mkdir(&(0x7f000340)='./file0\x00', 0x0)
mount(&(0x7fc0)='./file0//ile0\x00',
&(0x7f80)='./file0\x00', &(0x7f000200)='cgroup2\x00', 0x0,
0x0)
r0 = open(&(0x7f40)='./file0//ile0\x00', 0x0, 0x0)
r1 = openat$cgroup_procs(r0, &(0x7f0001c0)='cgroup.procs\x00', 0x2, 0x0)
write$cgroup_pid(r1, &(0x7f000100), 0x12)
syz_mount_image$xfs(&(0x7f000280)='xfs\x00',
&(0x7f0002c0)='./file0//ile0\x00', 0x0, 0x0, &(0x7f000740),
0x0, &(0x7f000800))
getsockopt$sock_cred(r0, 0x1, 0x11, &(0x7f000180), &(0x7f000240)=0xc)
r2 = open(&(0x7f40)='./file0//ile0\x00', 0x0, 0x0)
write$FUSE_STATFS(r2, &(0x7f000280)={0x60, 0xfffe,
0x7, {{0x401, 0x7, 0x7d, 0x1c, 0xd8, 0xb7, 0x2, 0x3}}}, 0x60)
r3 = openat$cgroup_int(r2, &(0x7f0001c0)='memory.max\x00', 0x2, 0x0)
write$cgroup_int(r3, &(0x7fc0), 0x12)


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 08:42:02, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer still triggered
> crash:
> WARNING in try_charge
> 
> Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB,
> file-rss:0kB, shmem-rss:0kB
> oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB,
> file-rss:0kB, shmem-rss:0kB
> task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1

Thank you. This is useful. The full oom picture is this
: [   65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. 
oom_victim=0
[...]
: [   65.920355] Task in /ile0 killed as a result of limit of /ile0
: [   65.926389] memory: usage 0kB, limit 0kB, failcnt 20
: [   65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
: [   65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
: [   65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB 
shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
: [   65.963878] Tasks state (memory values in pages):
: [   65.968743] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
swapents oom_score_adj name
: [   65.977615] [   6410] 0  6410 9427  532614400  
   0 syz-executor5
: [   65.986647] Memory cgroup out of memory: Kill process 6410 (syz-executor5) 
score 547000 or sacrifice child
: [   65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, 
anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB
: [   66.007471] oom_reaper: reaped process 6410 (syz-executor5), now 
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
: [   66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. 
oom_victim=1
: [   66.025137] [ cut here ]
: [   66.029927] Memory cgroup charge failed because of no reclaimable memory! 
This looks like a misconfiguration or a kernel bug.
: [   66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 
try_charge+0x734/0x1680

So we have only a single task in the memcg and it is this task which
triggers the OOM. It gets killed and oom_reaped. This means that
out_of_memory should return with true and so we should retry and force
the charge as I've already mentioned. For some reason this task has
triggered the oom killer path again and then we haven't found any
eligible task and resulted in the warning. This shouldn't happen.

I will stare to the code some more to see how the heck we get there
without passing 
if (unlikely(tsk_is_oom_victim(current) ||
 fatal_signal_pending(current) ||
 current->flags & PF_EXITING))
goto force;
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 08:42:02, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer still triggered
> crash:
> WARNING in try_charge
> 
> Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB,
> file-rss:0kB, shmem-rss:0kB
> oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB,
> file-rss:0kB, shmem-rss:0kB
> task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1

Thank you. This is useful. The full oom picture is this
: [   65.363983] task=syz-executor5 pid=6415 invoked memcg oom killer. 
oom_victim=0
[...]
: [   65.920355] Task in /ile0 killed as a result of limit of /ile0
: [   65.926389] memory: usage 0kB, limit 0kB, failcnt 20
: [   65.931518] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
: [   65.938296] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
: [   65.944467] Memory cgroup stats for /ile0: cache:0KB rss:0KB rss_huge:0KB 
shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
: [   65.963878] Tasks state (memory values in pages):
: [   65.968743] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
swapents oom_score_adj name
: [   65.977615] [   6410] 0  6410 9427  532614400  
   0 syz-executor5
: [   65.986647] Memory cgroup out of memory: Kill process 6410 (syz-executor5) 
score 547000 or sacrifice child
: [   65.996474] Killed process 6410 (syz-executor5) total-vm:37708kB, 
anon-rss:2128kB, file-rss:0kB, shmem-rss:0kB
: [   66.007471] oom_reaper: reaped process 6410 (syz-executor5), now 
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
: [   66.017652] task=syz-executor5 pid=6410 invoked memcg oom killer. 
oom_victim=1
: [   66.025137] [ cut here ]
: [   66.029927] Memory cgroup charge failed because of no reclaimable memory! 
This looks like a misconfiguration or a kernel bug.
: [   66.030061] WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 
try_charge+0x734/0x1680

So we have only a single task in the memcg and it is this task which
triggers the OOM. It gets killed and oom_reaped. This means that
out_of_memory should return with true and so we should retry and force
the charge as I've already mentioned. For some reason this task has
triggered the oom killer path again and then we haven't found any
eligible task and resulted in the warning. This shouldn't happen.

I will stare to the code some more to see how the heck we get there
without passing 
if (unlikely(tsk_is_oom_victim(current) ||
 fatal_signal_pending(current) ||
 current->flags & PF_EXITING))
goto force;
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 16:58:01, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko  wrote:
> > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
> >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
> > [...]
> >> >> A much
> >> >> friendlier for user way to say this would be print a message at the
> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
> >> >> trace (does not give any useful info for user). And return EINVAL if
> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the
> >> >> WARNING each occurrence of which we do want to be reported to kernel
> >> >> developers.
> >> >
> >> > But this is not applicable here. Your misconfiguration is quite obvious
> >> > because you simply set the hard limit to 0. This is not the only
> >> > situation when this can happen. There is no clear point to tell, you are
> >> > doing this wrong. If it was we would do it at that point obviously.
> >>
> >> But, isn't there a point were hard limit is set to 0? I would expect
> >> there is a something like cgroup file write handler with a value of 0
> >> or something.
> >
> > Yeah, but this is only one instance of the problem. Other is that the
> > memcg is not reclaimable for any other reasons. And we do not know what
> > those might be
> >
> >>
> >> > If you have a strong reason to believe that this is an abuse of WARN I
> >> > am all happy to change that. But I haven't heard any yet, to be honest.
> >>
> >> WARN must not be used for anything that is not kernel bugs. If this is
> >> not kernel bug, WARN must not be used here.
> >
> > This is rather strong wording without any backing arguments. I strongly
> > doubt 90% of existing WARN* match this expectation. WARN* has
> > traditionally been a way to tell that something suspicious is going on.
> > Those situation are mostly likely not fatal but it is good to know they
> > are happening.
> >
> > Sure there is that panic_on_warn thingy which you seem to be using and I
> > suspect it is a reason why you are so careful about warnings in general
> > but my experience tells me that this configuration is barely usable
> > except for testing (which is your case).
> >
> > But as I've said, I do not insist on WARN here. All I care about is to
> > warn user that something might go south and this may be either due to
> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.
> 
> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is
> also a kernel bug on top of that and it's actually a kernel bug that
> provokes the warning?

As I've tried to tell already. I cannot tell for sure. It is the killed
oom victim which triggered thw warning and that shouldn't really
happen. Considering this doesn't reproduce with the current linux next
nor linus tree and the oom code has changed since the version you have
tested then I would suspect there was something wrong with the memcg oom
code. But maybe the test doesn't really reproduce reliably.

> If it's a kernel bug, then I propose to stop arguing about
> configuration and concentrate on the bug.
> If it's just the misconfiguration that triggers the warning,  then can
> we separate the 2 causes of the warning (user misconfiguration and
> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print
> a line to console if necessary)? Or if the limit=0 is somehow not
> possible/desirable to detect right away, check limit=0 at the point of
> the warning and don't want?

No we simply cannot. There is numerous situations when this can trigger.
Say you set the hard limit to N and then try to fault in shmem file with
the size >= N. No oom killer will help to reclaim memory. Or say you
migrate the all tasks away from the memcg and then somebody triggers the
memcg OOM in that group. There is simply nobody to kill. See the point?
There is simply no direct contection between the configuration and
actual problem. Too many things might happen between those two points.
Let me repeat. We do warn because we want to hear if this happens. WARN
tends to be a good way to get that attention. If you strongly believe
this is an abuse I won't mind seeing a patch to turn it into something
different.

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 16:58:01, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko  wrote:
> > On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
> >> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
> > [...]
> >> >> A much
> >> >> friendlier for user way to say this would be print a message at the
> >> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
> >> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
> >> >> trace (does not give any useful info for user). And return EINVAL if
> >> >> it can't fly at all? And then leave the "or a kernel bug" part for the
> >> >> WARNING each occurrence of which we do want to be reported to kernel
> >> >> developers.
> >> >
> >> > But this is not applicable here. Your misconfiguration is quite obvious
> >> > because you simply set the hard limit to 0. This is not the only
> >> > situation when this can happen. There is no clear point to tell, you are
> >> > doing this wrong. If it was we would do it at that point obviously.
> >>
> >> But, isn't there a point were hard limit is set to 0? I would expect
> >> there is a something like cgroup file write handler with a value of 0
> >> or something.
> >
> > Yeah, but this is only one instance of the problem. Other is that the
> > memcg is not reclaimable for any other reasons. And we do not know what
> > those might be
> >
> >>
> >> > If you have a strong reason to believe that this is an abuse of WARN I
> >> > am all happy to change that. But I haven't heard any yet, to be honest.
> >>
> >> WARN must not be used for anything that is not kernel bugs. If this is
> >> not kernel bug, WARN must not be used here.
> >
> > This is rather strong wording without any backing arguments. I strongly
> > doubt 90% of existing WARN* match this expectation. WARN* has
> > traditionally been a way to tell that something suspicious is going on.
> > Those situation are mostly likely not fatal but it is good to know they
> > are happening.
> >
> > Sure there is that panic_on_warn thingy which you seem to be using and I
> > suspect it is a reason why you are so careful about warnings in general
> > but my experience tells me that this configuration is barely usable
> > except for testing (which is your case).
> >
> > But as I've said, I do not insist on WARN here. All I care about is to
> > warn user that something might go south and this may be either due to
> > misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.
> 
> I am a bit lost. Can limit=0 legally lead to the warnings? Or there is
> also a kernel bug on top of that and it's actually a kernel bug that
> provokes the warning?

As I've tried to tell already. I cannot tell for sure. It is the killed
oom victim which triggered thw warning and that shouldn't really
happen. Considering this doesn't reproduce with the current linux next
nor linus tree and the oom code has changed since the version you have
tested then I would suspect there was something wrong with the memcg oom
code. But maybe the test doesn't really reproduce reliably.

> If it's a kernel bug, then I propose to stop arguing about
> configuration and concentrate on the bug.
> If it's just the misconfiguration that triggers the warning,  then can
> we separate the 2 causes of the warning (user misconfiguration and
> kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print
> a line to console if necessary)? Or if the limit=0 is somehow not
> possible/desirable to detect right away, check limit=0 at the point of
> the warning and don't want?

No we simply cannot. There is numerous situations when this can trigger.
Say you set the hard limit to N and then try to fault in shmem file with
the size >= N. No oom killer will help to reclaim memory. Or say you
migrate the all tasks away from the memcg and then somebody triggers the
memcg OOM in that group. There is simply nobody to kill. See the point?
There is simply no direct contection between the configuration and
actual problem. Too many things might happen between those two points.
Let me repeat. We do warn because we want to hear if this happens. WARN
tends to be a good way to get that attention. If you strongly believe
this is an abuse I won't mind seeing a patch to turn it into something
different.

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/07 0:42, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer still triggered crash:
> WARNING in try_charge
> 
> Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, 
> file-rss:0kB, shmem-rss:0kB
> oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, 
> file-rss:0kB, shmem-rss:0kB
> task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1
> [ cut here ]
> Memory cgroup charge failed because of no reclaimable memory! This looks like 
> a misconfiguration or a kernel bug.
> WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 mem_cgroup_oom 
> mm/memcontrol.c:1706 [inline]
> WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 
> mm/memcontrol.c:2264
> Kernel panic - not syncing: panic_on_warn set ...

Michal, this is "mm, oom: task_will_free_mem(current) should ignore 
MMF_OOM_SKIP for once."
problem which you are refusing at 
https://www.spinics.net/lists/linux-mm/msg133774.html .


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/07 0:42, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch but the reproducer still triggered crash:
> WARNING in try_charge
> 
> Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB, 
> file-rss:0kB, shmem-rss:0kB
> oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB, 
> file-rss:0kB, shmem-rss:0kB
> task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1
> [ cut here ]
> Memory cgroup charge failed because of no reclaimable memory! This looks like 
> a misconfiguration or a kernel bug.
> WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 mem_cgroup_oom 
> mm/memcontrol.c:1706 [inline]
> WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680 
> mm/memcontrol.c:2264
> Kernel panic - not syncing: panic_on_warn set ...

Michal, this is "mm, oom: task_will_free_mem(current) should ignore 
MMF_OOM_SKIP for once."
problem which you are refusing at 
https://www.spinics.net/lists/linux-mm/msg133774.html .


Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:

WARNING in try_charge

Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB,  
file-rss:0kB, shmem-rss:0kB
oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB,  
file-rss:0kB, shmem-rss:0kB

task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1
[ cut here ]
Memory cgroup charge failed because of no reclaimable memory! This looks  
like a misconfiguration or a kernel bug.
WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 mem_cgroup_oom  
mm/memcontrol.c:1706 [inline]
WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680  
mm/memcontrol.c:2264

Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6410 Comm: syz-executor5 Not tainted 4.18.0-rc7-next-20180803+  
#1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 panic+0x238/0x4e7 kernel/panic.c:184
 __warn.cold.8+0x163/0x1ba kernel/panic.c:536
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:996
RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline]
RIP: 0010:try_charge+0x734/0x1680 mm/memcontrol.c:2264
Code: 85 b8 04 00 00 8b b5 c8 fd ff ff 44 89 f2 4c 89 ff e8 00 51 ff ff 84  
c0 0f 85 31 08 00 00 48 c7 c7 c0 17 13 87 e8 8c fd 85 ff <0f> 0b 48 8d 95  
f8 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00

RSP: 0018:8801be6ef580 EFLAGS: 00010286
RAX:  RBX: 8801afab4c00 RCX: 
RDX:  RSI: 816366f1 RDI: 8801be6ef270
RBP: 8801be6ef810 R08: 8801bcf48140 R09: fbfff0ff1238
R10: fbfff0ff1238 R11: 87f891c3 R12: dc00
R13: 8801be6ef7e8 R14:  R15: 8801afab4c00
 mem_cgroup_try_charge+0x4ff/0xa70 mm/memcontrol.c:5916
 mem_cgroup_try_charge_delay+0x1d/0x90 mm/memcontrol.c:5931
 do_anonymous_page mm/memory.c:3166 [inline]
 handle_pte_fault mm/memory.c:3971 [inline]
 __handle_mm_fault+0x25be/0x4470 mm/memory.c:4097
 handle_mm_fault+0x53e/0xc80 mm/memory.c:4134
 __do_page_fault+0x620/0xe50 arch/x86/mm/fault.c:1395
 do_page_fault+0xf6/0x8c0 arch/x86/mm/fault.c:1470
 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1164
RIP: 0033:0x40e33f
Code: Bad RIP value.
RSP: 002b:7ffe221246e0 EFLAGS: 00010206
RAX: 7fb83d11b000 RBX: 0002 RCX: 00456b7a
RDX: 00021000 RSI: 00021000 RDI: 
RBP: 7ffe221247c0 R08:  R09: 
R10:  R11: 0246 R12: 7ffe221248b0
R13: 7fb83d13b700 R14: 0005 R15: 0001
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit: 116b181bb646 Add linux-next specific files for 20180803
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

console output: https://syzkaller.appspot.com/x/log.txt?x=1244786440
kernel config:  https://syzkaller.appspot.com/x/.config?x=b4f38be7c2c519d5
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  https://syzkaller.appspot.com/x/patch.diff?x=14c5b68c40



Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch but the reproducer still triggered  
crash:

WARNING in try_charge

Killed process 6410 (syz-executor5) total-vm:37708kB, anon-rss:2128kB,  
file-rss:0kB, shmem-rss:0kB
oom_reaper: reaped process 6410 (syz-executor5), now anon-rss:0kB,  
file-rss:0kB, shmem-rss:0kB

task=syz-executor5 pid=6410 invoked memcg oom killer. oom_victim=1
[ cut here ]
Memory cgroup charge failed because of no reclaimable memory! This looks  
like a misconfiguration or a kernel bug.
WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 mem_cgroup_oom  
mm/memcontrol.c:1706 [inline]
WARNING: CPU: 1 PID: 6410 at mm/memcontrol.c:1707 try_charge+0x734/0x1680  
mm/memcontrol.c:2264

Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 6410 Comm: syz-executor5 Not tainted 4.18.0-rc7-next-20180803+  
#1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 panic+0x238/0x4e7 kernel/panic.c:184
 __warn.cold.8+0x163/0x1ba kernel/panic.c:536
 report_bug+0x252/0x2d0 lib/bug.c:186
 fixup_bug arch/x86/kernel/traps.c:178 [inline]
 do_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:996
RIP: 0010:mem_cgroup_oom mm/memcontrol.c:1706 [inline]
RIP: 0010:try_charge+0x734/0x1680 mm/memcontrol.c:2264
Code: 85 b8 04 00 00 8b b5 c8 fd ff ff 44 89 f2 4c 89 ff e8 00 51 ff ff 84  
c0 0f 85 31 08 00 00 48 c7 c7 c0 17 13 87 e8 8c fd 85 ff <0f> 0b 48 8d 95  
f8 fd ff ff 48 8b b5 c0 fd ff ff 48 b8 00 00 00 00

RSP: 0018:8801be6ef580 EFLAGS: 00010286
RAX:  RBX: 8801afab4c00 RCX: 
RDX:  RSI: 816366f1 RDI: 8801be6ef270
RBP: 8801be6ef810 R08: 8801bcf48140 R09: fbfff0ff1238
R10: fbfff0ff1238 R11: 87f891c3 R12: dc00
R13: 8801be6ef7e8 R14:  R15: 8801afab4c00
 mem_cgroup_try_charge+0x4ff/0xa70 mm/memcontrol.c:5916
 mem_cgroup_try_charge_delay+0x1d/0x90 mm/memcontrol.c:5931
 do_anonymous_page mm/memory.c:3166 [inline]
 handle_pte_fault mm/memory.c:3971 [inline]
 __handle_mm_fault+0x25be/0x4470 mm/memory.c:4097
 handle_mm_fault+0x53e/0xc80 mm/memory.c:4134
 __do_page_fault+0x620/0xe50 arch/x86/mm/fault.c:1395
 do_page_fault+0xf6/0x8c0 arch/x86/mm/fault.c:1470
 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1164
RIP: 0033:0x40e33f
Code: Bad RIP value.
RSP: 002b:7ffe221246e0 EFLAGS: 00010206
RAX: 7fb83d11b000 RBX: 0002 RCX: 00456b7a
RDX: 00021000 RSI: 00021000 RDI: 
RBP: 7ffe221247c0 R08:  R09: 
R10:  R11: 0246 R12: 7ffe221248b0
R13: 7fb83d13b700 R14: 0005 R15: 0001
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit: 116b181bb646 Add linux-next specific files for 20180803
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git

console output: https://syzkaller.appspot.com/x/log.txt?x=1244786440
kernel config:  https://syzkaller.appspot.com/x/.config?x=b4f38be7c2c519d5
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  https://syzkaller.appspot.com/x/patch.diff?x=14c5b68c40



Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
Now, let's test next-20180803 .

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
116b181bb646afedd770985de20a68721bdb2648

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..852cd3dbdcd9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
bool ret;
 
mutex_lock(_lock);
+   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
+   current->comm, current->pid, 
tsk_is_oom_victim(current));
ret = out_of_memory();
mutex_unlock(_lock);
return ret;



Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
Now, let's test next-20180803 .

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
116b181bb646afedd770985de20a68721bdb2648

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..852cd3dbdcd9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
bool ret;
 
mutex_lock(_lock);
+   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
+   current->comm, current->pid, 
tsk_is_oom_victim(current));
ret = out_of_memory();
mutex_unlock(_lock);
return ret;



Re: WARNING in try_charge

2018-08-06 Thread Johannes Weiner
On Mon, Aug 06, 2018 at 04:21:24PM +0200, Michal Hocko wrote:
> On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
> > On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
> > > If you have a strong reason to believe that this is an abuse of WARN I
> > > am all happy to change that. But I haven't heard any yet, to be honest.
> > 
> > WARN must not be used for anything that is not kernel bugs. If this is
> > not kernel bug, WARN must not be used here.
> 
> This is rather strong wording without any backing arguments. I strongly
> doubt 90% of existing WARN* match this expectation. WARN* has
> traditionally been a way to tell that something suspicious is going on.
> Those situation are mostly likely not fatal but it is good to know they
> are happening.

I have to agree with Dmitry here. WARN should indicate a real kernel
issue, not user input that knowingly triggers undesirable behavior in
the kernel. It's our assert() for states we don't think are possible.

I would wager that MOST developers and users understand it that way.


Re: WARNING in try_charge

2018-08-06 Thread Johannes Weiner
On Mon, Aug 06, 2018 at 04:21:24PM +0200, Michal Hocko wrote:
> On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
> > On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
> > > If you have a strong reason to believe that this is an abuse of WARN I
> > > am all happy to change that. But I haven't heard any yet, to be honest.
> > 
> > WARN must not be used for anything that is not kernel bugs. If this is
> > not kernel bug, WARN must not be used here.
> 
> This is rather strong wording without any backing arguments. I strongly
> doubt 90% of existing WARN* match this expectation. WARN* has
> traditionally been a way to tell that something suspicious is going on.
> Those situation are mostly likely not fatal but it is good to know they
> are happening.

I have to agree with Dmitry here. WARN should indicate a real kernel
issue, not user input that knowingly triggers undesirable behavior in
the kernel. It's our assert() for states we don't think are possible.

I would wager that MOST developers and users understand it that way.


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/06 23:58, Michal Hocko wrote:
>> Since I can't find mm related changes between next-20180803 (syzbot can 
>> reproduce) and
>> next-20180806 (syzbot has not reproduced), I can't guess what makes this 
>> problem go away.
> 
> Hmm, but original report was against 4.18.0-rc6-next-20180725+ kernel.
> And that one had the old group oom code. /me confused.
> 

Yes. But I confirmed that syzbot can reproduce this problem with next-20180803
which already dropped the old group oom code. Therefore, I think that syzbot is
hitting a problem other than the old group oom code.


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/06 23:58, Michal Hocko wrote:
>> Since I can't find mm related changes between next-20180803 (syzbot can 
>> reproduce) and
>> next-20180806 (syzbot has not reproduced), I can't guess what makes this 
>> problem go away.
> 
> Hmm, but original report was against 4.18.0-rc6-next-20180725+ kernel.
> And that one had the old group oom code. /me confused.
> 

Yes. But I confirmed that syzbot can reproduce this problem with next-20180803
which already dropped the old group oom code. Therefore, I think that syzbot is
hitting a problem other than the old group oom code.


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko  wrote:
> On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
>> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
> [...]
>> >> A much
>> >> friendlier for user way to say this would be print a message at the
>> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
>> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
>> >> trace (does not give any useful info for user). And return EINVAL if
>> >> it can't fly at all? And then leave the "or a kernel bug" part for the
>> >> WARNING each occurrence of which we do want to be reported to kernel
>> >> developers.
>> >
>> > But this is not applicable here. Your misconfiguration is quite obvious
>> > because you simply set the hard limit to 0. This is not the only
>> > situation when this can happen. There is no clear point to tell, you are
>> > doing this wrong. If it was we would do it at that point obviously.
>>
>> But, isn't there a point were hard limit is set to 0? I would expect
>> there is a something like cgroup file write handler with a value of 0
>> or something.
>
> Yeah, but this is only one instance of the problem. Other is that the
> memcg is not reclaimable for any other reasons. And we do not know what
> those might be
>
>>
>> > If you have a strong reason to believe that this is an abuse of WARN I
>> > am all happy to change that. But I haven't heard any yet, to be honest.
>>
>> WARN must not be used for anything that is not kernel bugs. If this is
>> not kernel bug, WARN must not be used here.
>
> This is rather strong wording without any backing arguments. I strongly
> doubt 90% of existing WARN* match this expectation. WARN* has
> traditionally been a way to tell that something suspicious is going on.
> Those situation are mostly likely not fatal but it is good to know they
> are happening.

Today syzbot covers about 1M lines of kernel code, and we fuzz for
several years with panic_on_warn=1 and each unique crash is recorded
and reported. Over several thousands bugs that we reported, there were
maybe 2 dozens of such cases (WARN on invalid user inputs, ENOMEM,
etc). The solution always was to remove the WARNING on covert to
pr_err. As of now, I see only 2 such cases open: this one and WARN on
ENOMEM in input subsystem.

Either way, we do badly need this separation. If there are deviations
we need to continue fixing them.


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko  wrote:
> On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
>> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
> [...]
>> >> A much
>> >> friendlier for user way to say this would be print a message at the
>> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
>> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
>> >> trace (does not give any useful info for user). And return EINVAL if
>> >> it can't fly at all? And then leave the "or a kernel bug" part for the
>> >> WARNING each occurrence of which we do want to be reported to kernel
>> >> developers.
>> >
>> > But this is not applicable here. Your misconfiguration is quite obvious
>> > because you simply set the hard limit to 0. This is not the only
>> > situation when this can happen. There is no clear point to tell, you are
>> > doing this wrong. If it was we would do it at that point obviously.
>>
>> But, isn't there a point were hard limit is set to 0? I would expect
>> there is a something like cgroup file write handler with a value of 0
>> or something.
>
> Yeah, but this is only one instance of the problem. Other is that the
> memcg is not reclaimable for any other reasons. And we do not know what
> those might be
>
>>
>> > If you have a strong reason to believe that this is an abuse of WARN I
>> > am all happy to change that. But I haven't heard any yet, to be honest.
>>
>> WARN must not be used for anything that is not kernel bugs. If this is
>> not kernel bug, WARN must not be used here.
>
> This is rather strong wording without any backing arguments. I strongly
> doubt 90% of existing WARN* match this expectation. WARN* has
> traditionally been a way to tell that something suspicious is going on.
> Those situation are mostly likely not fatal but it is good to know they
> are happening.

Today syzbot covers about 1M lines of kernel code, and we fuzz for
several years with panic_on_warn=1 and each unique crash is recorded
and reported. Over several thousands bugs that we reported, there were
maybe 2 dozens of such cases (WARN on invalid user inputs, ENOMEM,
etc). The solution always was to remove the WARNING on covert to
pr_err. As of now, I see only 2 such cases open: this one and WARN on
ENOMEM in input subsystem.

Either way, we do badly need this separation. If there are deviations
we need to continue fixing them.


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/06 23:54, David Howells wrote:
> Do you have a link to the problem?
> 
> David
> 

https://groups.google.com/forum/#!msg/syzkaller-bugs/R03vI7RCVco/0PijCTrcCgAJ

syzbot found a reproducer, and the reproducer was working until next-20180803.
But the reproducer is failing to reproduce this problem in next-20180806 despite
there is no mm related change between next-20180803 and next-20180806.

Therefore, I suspect that the reproducer is no longer working as intended. And
there was parser change (your patch) between next-20180803 and next-20180806.



Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/06 23:54, David Howells wrote:
> Do you have a link to the problem?
> 
> David
> 

https://groups.google.com/forum/#!msg/syzkaller-bugs/R03vI7RCVco/0PijCTrcCgAJ

syzbot found a reproducer, and the reproducer was working until next-20180803.
But the reproducer is failing to reproduce this problem in next-20180806 despite
there is no mm related change between next-20180803 and next-20180806.

Therefore, I suspect that the reproducer is no longer working as intended. And
there was parser change (your patch) between next-20180803 and next-20180806.



Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 23:41:22, Tetsuo Handa wrote:
> +David Howells
> 
> On 2018/08/06 20:32, Michal Hocko wrote:
> > On Mon 06-08-18 04:27:02, syzbot wrote:
> >> Hello,
> >>
> >> syzbot has tested the proposed patch and the reproducer did not trigger
> >> crash:
> >>
> >> Reported-and-tested-by:
> >> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
> >>
> >> Tested on:
> >>
> >> commit: 8c8399e0a3fb Add linux-next specific files for 20180806
> >> git tree:   linux-next
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
> >> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> >> patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240
> >>
> >> Note: testing is done by a robot and is best-effort only.
> > 
> > OK, so this smells like a problem in the previous group oom changes. Or
> > maybe it is not very easy to reproduce?
> > 
> 
> Since I can't find mm related changes between next-20180803 (syzbot can 
> reproduce) and
> next-20180806 (syzbot has not reproduced), I can't guess what makes this 
> problem go away.

Hmm, but original report was against 4.18.0-rc6-next-20180725+ kernel.
And that one had the old group oom code. /me confused.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 23:41:22, Tetsuo Handa wrote:
> +David Howells
> 
> On 2018/08/06 20:32, Michal Hocko wrote:
> > On Mon 06-08-18 04:27:02, syzbot wrote:
> >> Hello,
> >>
> >> syzbot has tested the proposed patch and the reproducer did not trigger
> >> crash:
> >>
> >> Reported-and-tested-by:
> >> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
> >>
> >> Tested on:
> >>
> >> commit: 8c8399e0a3fb Add linux-next specific files for 20180806
> >> git tree:   linux-next
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
> >> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> >> patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240
> >>
> >> Note: testing is done by a robot and is best-effort only.
> > 
> > OK, so this smells like a problem in the previous group oom changes. Or
> > maybe it is not very easy to reproduce?
> > 
> 
> Since I can't find mm related changes between next-20180803 (syzbot can 
> reproduce) and
> next-20180806 (syzbot has not reproduced), I can't guess what makes this 
> problem go away.

Hmm, but original report was against 4.18.0-rc6-next-20180725+ kernel.
And that one had the old group oom code. /me confused.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko  wrote:
> On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
>> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
> [...]
>> >> A much
>> >> friendlier for user way to say this would be print a message at the
>> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
>> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
>> >> trace (does not give any useful info for user). And return EINVAL if
>> >> it can't fly at all? And then leave the "or a kernel bug" part for the
>> >> WARNING each occurrence of which we do want to be reported to kernel
>> >> developers.
>> >
>> > But this is not applicable here. Your misconfiguration is quite obvious
>> > because you simply set the hard limit to 0. This is not the only
>> > situation when this can happen. There is no clear point to tell, you are
>> > doing this wrong. If it was we would do it at that point obviously.
>>
>> But, isn't there a point were hard limit is set to 0? I would expect
>> there is a something like cgroup file write handler with a value of 0
>> or something.
>
> Yeah, but this is only one instance of the problem. Other is that the
> memcg is not reclaimable for any other reasons. And we do not know what
> those might be
>
>>
>> > If you have a strong reason to believe that this is an abuse of WARN I
>> > am all happy to change that. But I haven't heard any yet, to be honest.
>>
>> WARN must not be used for anything that is not kernel bugs. If this is
>> not kernel bug, WARN must not be used here.
>
> This is rather strong wording without any backing arguments. I strongly
> doubt 90% of existing WARN* match this expectation. WARN* has
> traditionally been a way to tell that something suspicious is going on.
> Those situation are mostly likely not fatal but it is good to know they
> are happening.
>
> Sure there is that panic_on_warn thingy which you seem to be using and I
> suspect it is a reason why you are so careful about warnings in general
> but my experience tells me that this configuration is barely usable
> except for testing (which is your case).
>
> But as I've said, I do not insist on WARN here. All I care about is to
> warn user that something might go south and this may be either due to
> misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.

I am a bit lost. Can limit=0 legally lead to the warnings? Or there is
also a kernel bug on top of that and it's actually a kernel bug that
provokes the warning?
If it's a kernel bug, then I propose to stop arguing about
configuration and concentrate on the bug.
If it's just the misconfiguration that triggers the warning,  then can
we separate the 2 causes of the warning (user misconfiguration and
kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print
a line to console if necessary)? Or if the limit=0 is somehow not
possible/desirable to detect right away, check limit=0 at the point of
the warning and don't want?


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 4:21 PM, Michal Hocko  wrote:
> On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
>> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
> [...]
>> >> A much
>> >> friendlier for user way to say this would be print a message at the
>> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
>> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
>> >> trace (does not give any useful info for user). And return EINVAL if
>> >> it can't fly at all? And then leave the "or a kernel bug" part for the
>> >> WARNING each occurrence of which we do want to be reported to kernel
>> >> developers.
>> >
>> > But this is not applicable here. Your misconfiguration is quite obvious
>> > because you simply set the hard limit to 0. This is not the only
>> > situation when this can happen. There is no clear point to tell, you are
>> > doing this wrong. If it was we would do it at that point obviously.
>>
>> But, isn't there a point were hard limit is set to 0? I would expect
>> there is a something like cgroup file write handler with a value of 0
>> or something.
>
> Yeah, but this is only one instance of the problem. Other is that the
> memcg is not reclaimable for any other reasons. And we do not know what
> those might be
>
>>
>> > If you have a strong reason to believe that this is an abuse of WARN I
>> > am all happy to change that. But I haven't heard any yet, to be honest.
>>
>> WARN must not be used for anything that is not kernel bugs. If this is
>> not kernel bug, WARN must not be used here.
>
> This is rather strong wording without any backing arguments. I strongly
> doubt 90% of existing WARN* match this expectation. WARN* has
> traditionally been a way to tell that something suspicious is going on.
> Those situation are mostly likely not fatal but it is good to know they
> are happening.
>
> Sure there is that panic_on_warn thingy which you seem to be using and I
> suspect it is a reason why you are so careful about warnings in general
> but my experience tells me that this configuration is barely usable
> except for testing (which is your case).
>
> But as I've said, I do not insist on WARN here. All I care about is to
> warn user that something might go south and this may be either due to
> misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.

I am a bit lost. Can limit=0 legally lead to the warnings? Or there is
also a kernel bug on top of that and it's actually a kernel bug that
provokes the warning?
If it's a kernel bug, then I propose to stop arguing about
configuration and concentrate on the bug.
If it's just the misconfiguration that triggers the warning,  then can
we separate the 2 causes of the warning (user misconfiguration and
kernel bugs)? Say, return EINVAL when mem limit is set to 0 (and print
a line to console if necessary)? Or if the limit=0 is somehow not
possible/desirable to detect right away, check limit=0 at the point of
the warning and don't want?


Re: WARNING in try_charge

2018-08-06 Thread David Howells
Do you have a link to the problem?

David


Re: WARNING in try_charge

2018-08-06 Thread David Howells
Do you have a link to the problem?

David


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
+David Howells

On 2018/08/06 20:32, Michal Hocko wrote:
> On Mon 06-08-18 04:27:02, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch and the reproducer did not trigger
>> crash:
>>
>> Reported-and-tested-by:
>> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
>>
>> Tested on:
>>
>> commit: 8c8399e0a3fb Add linux-next specific files for 20180806
>> git tree:   linux-next
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>> patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240
>>
>> Note: testing is done by a robot and is best-effort only.
> 
> OK, so this smells like a problem in the previous group oom changes. Or
> maybe it is not very easy to reproduce?
> 

Since I can't find mm related changes between next-20180803 (syzbot can 
reproduce) and
next-20180806 (syzbot has not reproduced), I can't guess what makes this 
problem go away.

But since this problem did not occur for 3.5 hours on next-20180806 (when this 
problem
was occurring once per 60-90 minutes), the reproducer might not be working as 
intended
due to "kernfs, sysfs, cgroup, intel_rdt: Support fs_context" or something...

  ./kernel/cgroup/cgroup-internal.h  |  
  3
  ./kernel/cgroup/cgroup-v1.c|  
211
  ./kernel/cgroup/cgroup.c   |  
 81


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
+David Howells

On 2018/08/06 20:32, Michal Hocko wrote:
> On Mon 06-08-18 04:27:02, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch and the reproducer did not trigger
>> crash:
>>
>> Reported-and-tested-by:
>> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
>>
>> Tested on:
>>
>> commit: 8c8399e0a3fb Add linux-next specific files for 20180806
>> git tree:   linux-next
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>> patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240
>>
>> Note: testing is done by a robot and is best-effort only.
> 
> OK, so this smells like a problem in the previous group oom changes. Or
> maybe it is not very easy to reproduce?
> 

Since I can't find mm related changes between next-20180803 (syzbot can 
reproduce) and
next-20180806 (syzbot has not reproduced), I can't guess what makes this 
problem go away.

But since this problem did not occur for 3.5 hours on next-20180806 (when this 
problem
was occurring once per 60-90 minutes), the reproducer might not be working as 
intended
due to "kernfs, sysfs, cgroup, intel_rdt: Support fs_context" or something...

  ./kernel/cgroup/cgroup-internal.h  |  
  3
  ./kernel/cgroup/cgroup-v1.c|  
211
  ./kernel/cgroup/cgroup.c   |  
 81


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
[...]
> >> A much
> >> friendlier for user way to say this would be print a message at the
> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
> >> trace (does not give any useful info for user). And return EINVAL if
> >> it can't fly at all? And then leave the "or a kernel bug" part for the
> >> WARNING each occurrence of which we do want to be reported to kernel
> >> developers.
> >
> > But this is not applicable here. Your misconfiguration is quite obvious
> > because you simply set the hard limit to 0. This is not the only
> > situation when this can happen. There is no clear point to tell, you are
> > doing this wrong. If it was we would do it at that point obviously.
> 
> But, isn't there a point were hard limit is set to 0? I would expect
> there is a something like cgroup file write handler with a value of 0
> or something.

Yeah, but this is only one instance of the problem. Other is that the
memcg is not reclaimable for any other reasons. And we do not know what
those might be

> 
> > If you have a strong reason to believe that this is an abuse of WARN I
> > am all happy to change that. But I haven't heard any yet, to be honest.
> 
> WARN must not be used for anything that is not kernel bugs. If this is
> not kernel bug, WARN must not be used here.

This is rather strong wording without any backing arguments. I strongly
doubt 90% of existing WARN* match this expectation. WARN* has
traditionally been a way to tell that something suspicious is going on.
Those situation are mostly likely not fatal but it is good to know they
are happening.

Sure there is that panic_on_warn thingy which you seem to be using and I
suspect it is a reason why you are so careful about warnings in general
but my experience tells me that this configuration is barely usable
except for testing (which is your case).

But as I've said, I do not insist on WARN here. All I care about is to
warn user that something might go south and this may be either due to
misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 13:57:38, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
[...]
> >> A much
> >> friendlier for user way to say this would be print a message at the
> >> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
> >> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
> >> trace (does not give any useful info for user). And return EINVAL if
> >> it can't fly at all? And then leave the "or a kernel bug" part for the
> >> WARNING each occurrence of which we do want to be reported to kernel
> >> developers.
> >
> > But this is not applicable here. Your misconfiguration is quite obvious
> > because you simply set the hard limit to 0. This is not the only
> > situation when this can happen. There is no clear point to tell, you are
> > doing this wrong. If it was we would do it at that point obviously.
> 
> But, isn't there a point were hard limit is set to 0? I would expect
> there is a something like cgroup file write handler with a value of 0
> or something.

Yeah, but this is only one instance of the problem. Other is that the
memcg is not reclaimable for any other reasons. And we do not know what
those might be

> 
> > If you have a strong reason to believe that this is an abuse of WARN I
> > am all happy to change that. But I haven't heard any yet, to be honest.
> 
> WARN must not be used for anything that is not kernel bugs. If this is
> not kernel bug, WARN must not be used here.

This is rather strong wording without any backing arguments. I strongly
doubt 90% of existing WARN* match this expectation. WARN* has
traditionally been a way to tell that something suspicious is going on.
Those situation are mostly likely not fatal but it is good to know they
are happening.

Sure there is that panic_on_warn thingy which you seem to be using and I
suspect it is a reason why you are so careful about warnings in general
but my experience tells me that this configuration is barely usable
except for testing (which is your case).

But as I've said, I do not insist on WARN here. All I care about is to
warn user that something might go south and this may be either due to
misconfiguration or a subtly wrong memcg reclaim/OOM handler behavior.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 1:32 PM, Michal Hocko  wrote:
> On Mon 06-08-18 04:27:02, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch and the reproducer did not trigger
>> crash:
>>
>> Reported-and-tested-by:
>> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
>>
>> Tested on:
>>
>> commit: 8c8399e0a3fb Add linux-next specific files for 20180806
>> git tree:   linux-next
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>> patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240
>>
>> Note: testing is done by a robot and is best-effort only.
>
> OK, so this smells like a problem in the previous group oom changes. Or
> maybe it is not very easy to reproduce?

It's possible to ask syzbot to test on a particular tree/commit too.


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 1:32 PM, Michal Hocko  wrote:
> On Mon 06-08-18 04:27:02, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch and the reproducer did not trigger
>> crash:
>>
>> Reported-and-tested-by:
>> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
>>
>> Tested on:
>>
>> commit: 8c8399e0a3fb Add linux-next specific files for 20180806
>> git tree:   linux-next
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>> patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240
>>
>> Note: testing is done by a robot and is best-effort only.
>
> OK, so this smells like a problem in the previous group oom changes. Or
> maybe it is not very easy to reproduce?

It's possible to ask syzbot to test on a particular tree/commit too.


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
>> >> > More interesting stuff is higher in the kernel log
>> >> > : [  366.435015] 
>> >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
>> >> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>> >> >
>> >> > Are you sure you want to have hard limit set to 0?
>> >>
>> >> syzkaller really does not mind to have it.
>> >
>> > So what do you use it for? What do you actually test by this setting?
>>
>> syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is
>> doable from user-space. Some of that may not make sense, but it does
>> not matter because kernel should still stand still.
>
> I am not questioning that. What I am saying is that the configuration
> doesn't make much sense and the kernel warns about it.
>
>> > [...]
>> >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> >> > index 4603ad75c9a9..852cd3dbdcd9 100644
>> >> > --- a/mm/memcontrol.c
>> >> > +++ b/mm/memcontrol.c
>> >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
>> >> > mem_cgroup *memcg, gfp_t gfp_mask,
>> >> > bool ret;
>> >> >
>> >> > mutex_lock(_lock);
>> >> > +   pr_info("task=%s pid=%d invoked memcg oom killer. 
>> >> > oom_victim=%d\n",
>> >> > +   current->comm, current->pid, 
>> >> > tsk_is_oom_victim(current));
>> >> > ret = out_of_memory();
>> >> > mutex_unlock(_lock);
>> >> > return ret;
>> >> >
>> >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
>> >> > and basically no memory charged by existing tasks is not going to fly
>> >> > and the warning is exactly to call that out.
>> >>
>> >>
>> >> Please-please-please do not mix kernel bugs and notices to user into
>> >> the same bucket:
>> >
>> > Well, WARN_ON used to be a standard way to make user aware of a
>> > misbehavior. In this case it warns about a pottential runaway when memcg
>> > is misconfigured. I do not insist on using WARN_ON here of course. If
>> > there is a general agreement that such a condition is better handled by
>> > pr_err then I am fine with it. Users tend to be more sensitive on
>> > WARN_ONs though.
>>
>> The docs change was acked by Greg, and Andrew took it into mm, Linus
>> was CCed too. It missed the release because I guess it's comments only
>> change, but otherwise it should reach upstream tree on the next merge
>> window.
>>
>> WARN is _not_ a common way to notify users today. syzbot reports _all_
>> WARN occurrences and you can see there are not many of them now
>> (probably 1 another now, +dtor for that one):
>> https://syzkaller.appspot.com#upstream
>> There is probably some long tail that we need to fix. We really do
>> want systematic testing capability. You do not want every of 2 billion
>> linux users to come to you with this kernel splat, just so that you
>> can explain to them that it's some programs of their machines doing
>> something wrong, right?
>
> [This is an orthogonal discussion I believe]
>
> How does it differ from pr_err though?

pr_err output looks very different, says that it is the user program
that does something wrong, explains what exactly is done wrong and
does not contain traces, offsets of hex numbers that scare end users.
WARN for kernel bugs/pr_err for user separation allows to easily
understand (including by computer programs) if it is a kernel bugs
(something to notify kernel developers about) or not. In particular if
it would be the case here, we would not have this bug report and would
not spent time here.

>> WARN is really a bad way to inform a user about something. Consider a
>> non-kernel developer, perhaps even non-programmer. What they see is
>> "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710
>> try_charge+0x734/0x1680" followed by some obscure things and hex
>> numbers. File:line reference is pointless, they don't what what/where
>> it is.
>
> Well, you get a precise location where the problem happens and the
> backtrace to see how it happened. This is much more information than,
> pr_err without dump_stack.

This information is important for kernel bugs, but not for invalid
values passed by user programs. For the latter the exact location
where the problem happened is not there, it's in some user program.
For the latter it is more important to explain in readable language
what exactly arguments to what call were wrong.
Say, if you enter a wrong pin in ATM, it says "you entered wrong pin"
and not dumps some internal state in hex.

>> This one is slightly better because it prints "Memory cgroup
>> charge failed because of no reclaimable memory! This looks like a
>> misconfiguration or a kernel bug." before the warning. But still it
>> says "or a kernel bug", which means that they will come to you.
>
> Yeah, and that was the purpose of the thing.

Why? Was it clear how exactly if can happen? Can we refine it 

Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 1:02 PM, Michal Hocko  wrote:
>> >> > More interesting stuff is higher in the kernel log
>> >> > : [  366.435015] 
>> >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
>> >> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>> >> >
>> >> > Are you sure you want to have hard limit set to 0?
>> >>
>> >> syzkaller really does not mind to have it.
>> >
>> > So what do you use it for? What do you actually test by this setting?
>>
>> syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is
>> doable from user-space. Some of that may not make sense, but it does
>> not matter because kernel should still stand still.
>
> I am not questioning that. What I am saying is that the configuration
> doesn't make much sense and the kernel warns about it.
>
>> > [...]
>> >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> >> > index 4603ad75c9a9..852cd3dbdcd9 100644
>> >> > --- a/mm/memcontrol.c
>> >> > +++ b/mm/memcontrol.c
>> >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
>> >> > mem_cgroup *memcg, gfp_t gfp_mask,
>> >> > bool ret;
>> >> >
>> >> > mutex_lock(_lock);
>> >> > +   pr_info("task=%s pid=%d invoked memcg oom killer. 
>> >> > oom_victim=%d\n",
>> >> > +   current->comm, current->pid, 
>> >> > tsk_is_oom_victim(current));
>> >> > ret = out_of_memory();
>> >> > mutex_unlock(_lock);
>> >> > return ret;
>> >> >
>> >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
>> >> > and basically no memory charged by existing tasks is not going to fly
>> >> > and the warning is exactly to call that out.
>> >>
>> >>
>> >> Please-please-please do not mix kernel bugs and notices to user into
>> >> the same bucket:
>> >
>> > Well, WARN_ON used to be a standard way to make user aware of a
>> > misbehavior. In this case it warns about a pottential runaway when memcg
>> > is misconfigured. I do not insist on using WARN_ON here of course. If
>> > there is a general agreement that such a condition is better handled by
>> > pr_err then I am fine with it. Users tend to be more sensitive on
>> > WARN_ONs though.
>>
>> The docs change was acked by Greg, and Andrew took it into mm, Linus
>> was CCed too. It missed the release because I guess it's comments only
>> change, but otherwise it should reach upstream tree on the next merge
>> window.
>>
>> WARN is _not_ a common way to notify users today. syzbot reports _all_
>> WARN occurrences and you can see there are not many of them now
>> (probably 1 another now, +dtor for that one):
>> https://syzkaller.appspot.com#upstream
>> There is probably some long tail that we need to fix. We really do
>> want systematic testing capability. You do not want every of 2 billion
>> linux users to come to you with this kernel splat, just so that you
>> can explain to them that it's some programs of their machines doing
>> something wrong, right?
>
> [This is an orthogonal discussion I believe]
>
> How does it differ from pr_err though?

pr_err output looks very different, says that it is the user program
that does something wrong, explains what exactly is done wrong and
does not contain traces, offsets of hex numbers that scare end users.
WARN for kernel bugs/pr_err for user separation allows to easily
understand (including by computer programs) if it is a kernel bugs
(something to notify kernel developers about) or not. In particular if
it would be the case here, we would not have this bug report and would
not spent time here.

>> WARN is really a bad way to inform a user about something. Consider a
>> non-kernel developer, perhaps even non-programmer. What they see is
>> "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710
>> try_charge+0x734/0x1680" followed by some obscure things and hex
>> numbers. File:line reference is pointless, they don't what what/where
>> it is.
>
> Well, you get a precise location where the problem happens and the
> backtrace to see how it happened. This is much more information than,
> pr_err without dump_stack.

This information is important for kernel bugs, but not for invalid
values passed by user programs. For the latter the exact location
where the problem happened is not there, it's in some user program.
For the latter it is more important to explain in readable language
what exactly arguments to what call were wrong.
Say, if you enter a wrong pin in ATM, it says "you entered wrong pin"
and not dumps some internal state in hex.

>> This one is slightly better because it prints "Memory cgroup
>> charge failed because of no reclaimable memory! This looks like a
>> misconfiguration or a kernel bug." before the warning. But still it
>> says "or a kernel bug", which means that they will come to you.
>
> Yeah, and that was the purpose of the thing.

Why? Was it clear how exactly if can happen? Can we refine it 

Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 04:27:02, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch and the reproducer did not trigger
> crash:
> 
> Reported-and-tested-by:
> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
> 
> Tested on:
> 
> commit: 8c8399e0a3fb Add linux-next specific files for 20180806
> git tree:   linux-next
> kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240
> 
> Note: testing is done by a robot and is best-effort only.

OK, so this smells like a problem in the previous group oom changes. Or
maybe it is not very easy to reproduce?

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 04:27:02, syzbot wrote:
> Hello,
> 
> syzbot has tested the proposed patch and the reproducer did not trigger
> crash:
> 
> Reported-and-tested-by:
> syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
> 
> Tested on:
> 
> commit: 8c8399e0a3fb Add linux-next specific files for 20180806
> git tree:   linux-next
> kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240
> 
> Note: testing is done by a robot and is best-effort only.

OK, so this smells like a problem in the previous group oom changes. Or
maybe it is not very easy to reproduce?

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:


Reported-and-tested-by:  
syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com


Tested on:

commit: 8c8399e0a3fb Add linux-next specific files for 20180806
git tree:   linux-next
kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240

Note: testing is done by a robot and is best-effort only.


Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:


Reported-and-tested-by:  
syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com


Tested on:

commit: 8c8399e0a3fb Add linux-next specific files for 20180806
git tree:   linux-next
kernel config:  https://syzkaller.appspot.com/x/.config?x=1b6bc1781e49e93e
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  https://syzkaller.appspot.com/x/patch.diff?x=14fe18e240

Note: testing is done by a robot and is best-effort only.


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 19:47:00, Tetsuo Handa wrote:
> On 2018/08/06 19:39, Dmitry Vyukov wrote:
> > On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
> >> Btw. running with the above diff on top might help us to ideantify
> >> whether this is a pre-mature warning or a valid one. Still useful to
> >> find out.
> 
> Since syzbot already found a syz reproducer, you can ask syzbot to test it.
> 
> > 
> > The bug report has a reproducer, so you can run it with the patch. Or
> > ask syzbot to test your patch:
> > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
> > Which basically boils down to saying:
> > 
> > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > master
> 
> Excuse me, but this is linux-next only problem. Therefore,

If this really is a linux-next only problem then please retest with the
current linux-next which has dropped the and replaced the group oom
code.

> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> master
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..852cd3dbdcd9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>   bool ret;
>  
>   mutex_lock(_lock);
> + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
> + current->comm, current->pid, 
> tsk_is_oom_victim(current));
>   ret = out_of_memory();
>   mutex_unlock(_lock);
>   return ret;
> 
> F.Y.I. Waiting until __mmput() completes (with timeout using OOM score 
> feedback)
> ( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solves this 
> race.

Which just means that something else is broken. Seriously, timout is not
going to fix anything. It merely changes the picture.

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 19:47:00, Tetsuo Handa wrote:
> On 2018/08/06 19:39, Dmitry Vyukov wrote:
> > On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
> >> Btw. running with the above diff on top might help us to ideantify
> >> whether this is a pre-mature warning or a valid one. Still useful to
> >> find out.
> 
> Since syzbot already found a syz reproducer, you can ask syzbot to test it.
> 
> > 
> > The bug report has a reproducer, so you can run it with the patch. Or
> > ask syzbot to test your patch:
> > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
> > Which basically boils down to saying:
> > 
> > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > master
> 
> Excuse me, but this is linux-next only problem. Therefore,

If this really is a linux-next only problem then please retest with the
current linux-next which has dropped the and replaced the group oom
code.

> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
> master
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 4603ad75c9a9..852cd3dbdcd9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
> *memcg, gfp_t gfp_mask,
>   bool ret;
>  
>   mutex_lock(_lock);
> + pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
> + current->comm, current->pid, 
> tsk_is_oom_victim(current));
>   ret = out_of_memory();
>   mutex_unlock(_lock);
>   return ret;
> 
> F.Y.I. Waiting until __mmput() completes (with timeout using OOM score 
> feedback)
> ( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solves this 
> race.

Which just means that something else is broken. Seriously, timout is not
going to fix anything. It merely changes the picture.

-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 12:34:30, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
> > On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote:
> >> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
> > [...]
> >> > More interesting stuff is higher in the kernel log
> >> > : [  366.435015] 
> >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
> >> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
> >> >
> >> > Are you sure you want to have hard limit set to 0?
> >>
> >> syzkaller really does not mind to have it.
> >
> > So what do you use it for? What do you actually test by this setting?
> 
> syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is
> doable from user-space. Some of that may not make sense, but it does
> not matter because kernel should still stand still.

I am not questioning that. What I am saying is that the configuration
doesn't make much sense and the kernel warns about it.

> > [...]
> >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> > index 4603ad75c9a9..852cd3dbdcd9 100644
> >> > --- a/mm/memcontrol.c
> >> > +++ b/mm/memcontrol.c
> >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
> >> > mem_cgroup *memcg, gfp_t gfp_mask,
> >> > bool ret;
> >> >
> >> > mutex_lock(_lock);
> >> > +   pr_info("task=%s pid=%d invoked memcg oom killer. 
> >> > oom_victim=%d\n",
> >> > +   current->comm, current->pid, 
> >> > tsk_is_oom_victim(current));
> >> > ret = out_of_memory();
> >> > mutex_unlock(_lock);
> >> > return ret;
> >> >
> >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
> >> > and basically no memory charged by existing tasks is not going to fly
> >> > and the warning is exactly to call that out.
> >>
> >>
> >> Please-please-please do not mix kernel bugs and notices to user into
> >> the same bucket:
> >
> > Well, WARN_ON used to be a standard way to make user aware of a
> > misbehavior. In this case it warns about a pottential runaway when memcg
> > is misconfigured. I do not insist on using WARN_ON here of course. If
> > there is a general agreement that such a condition is better handled by
> > pr_err then I am fine with it. Users tend to be more sensitive on
> > WARN_ONs though.
> 
> The docs change was acked by Greg, and Andrew took it into mm, Linus
> was CCed too. It missed the release because I guess it's comments only
> change, but otherwise it should reach upstream tree on the next merge
> window.
> 
> WARN is _not_ a common way to notify users today. syzbot reports _all_
> WARN occurrences and you can see there are not many of them now
> (probably 1 another now, +dtor for that one):
> https://syzkaller.appspot.com#upstream
> There is probably some long tail that we need to fix. We really do
> want systematic testing capability. You do not want every of 2 billion
> linux users to come to you with this kernel splat, just so that you
> can explain to them that it's some programs of their machines doing
> something wrong, right?

[This is an orthogonal discussion I believe]

How does it differ from pr_err though?

> WARN is really a bad way to inform a user about something. Consider a
> non-kernel developer, perhaps even non-programmer. What they see is
> "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710
> try_charge+0x734/0x1680" followed by some obscure things and hex
> numbers. File:line reference is pointless, they don't what what/where
> it is.

Well, you get a precise location where the problem happens and the
backtrace to see how it happened. This is much more information than,
pr_err without dump_stack.

> This one is slightly better because it prints "Memory cgroup
> charge failed because of no reclaimable memory! This looks like a
> misconfiguration or a kernel bug." before the warning. But still it
> says "or a kernel bug", which means that they will come to you.

Yeah, and that was the purpose of the thing.

> A much
> friendlier for user way to say this would be print a message at the
> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
> trace (does not give any useful info for user). And return EINVAL if
> it can't fly at all? And then leave the "or a kernel bug" part for the
> WARNING each occurrence of which we do want to be reported to kernel
> developers.

But this is not applicable here. Your misconfiguration is quite obvious
because you simply set the hard limit to 0. This is not the only
situation when this can happen. There is no clear point to tell, you are
doing this wrong. If it was we would do it at that point obviously.

If you have a strong reason to believe that this is an abuse of WARN I
am all happy to change that. But I haven't heard any yet, to be honest.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 12:34:30, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
> > On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote:
> >> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
> > [...]
> >> > More interesting stuff is higher in the kernel log
> >> > : [  366.435015] 
> >> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
> >> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
> >> >
> >> > Are you sure you want to have hard limit set to 0?
> >>
> >> syzkaller really does not mind to have it.
> >
> > So what do you use it for? What do you actually test by this setting?
> 
> syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is
> doable from user-space. Some of that may not make sense, but it does
> not matter because kernel should still stand still.

I am not questioning that. What I am saying is that the configuration
doesn't make much sense and the kernel warns about it.

> > [...]
> >> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> > index 4603ad75c9a9..852cd3dbdcd9 100644
> >> > --- a/mm/memcontrol.c
> >> > +++ b/mm/memcontrol.c
> >> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
> >> > mem_cgroup *memcg, gfp_t gfp_mask,
> >> > bool ret;
> >> >
> >> > mutex_lock(_lock);
> >> > +   pr_info("task=%s pid=%d invoked memcg oom killer. 
> >> > oom_victim=%d\n",
> >> > +   current->comm, current->pid, 
> >> > tsk_is_oom_victim(current));
> >> > ret = out_of_memory();
> >> > mutex_unlock(_lock);
> >> > return ret;
> >> >
> >> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
> >> > and basically no memory charged by existing tasks is not going to fly
> >> > and the warning is exactly to call that out.
> >>
> >>
> >> Please-please-please do not mix kernel bugs and notices to user into
> >> the same bucket:
> >
> > Well, WARN_ON used to be a standard way to make user aware of a
> > misbehavior. In this case it warns about a pottential runaway when memcg
> > is misconfigured. I do not insist on using WARN_ON here of course. If
> > there is a general agreement that such a condition is better handled by
> > pr_err then I am fine with it. Users tend to be more sensitive on
> > WARN_ONs though.
> 
> The docs change was acked by Greg, and Andrew took it into mm, Linus
> was CCed too. It missed the release because I guess it's comments only
> change, but otherwise it should reach upstream tree on the next merge
> window.
> 
> WARN is _not_ a common way to notify users today. syzbot reports _all_
> WARN occurrences and you can see there are not many of them now
> (probably 1 another now, +dtor for that one):
> https://syzkaller.appspot.com#upstream
> There is probably some long tail that we need to fix. We really do
> want systematic testing capability. You do not want every of 2 billion
> linux users to come to you with this kernel splat, just so that you
> can explain to them that it's some programs of their machines doing
> something wrong, right?

[This is an orthogonal discussion I believe]

How does it differ from pr_err though?

> WARN is really a bad way to inform a user about something. Consider a
> non-kernel developer, perhaps even non-programmer. What they see is
> "WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710
> try_charge+0x734/0x1680" followed by some obscure things and hex
> numbers. File:line reference is pointless, they don't what what/where
> it is.

Well, you get a precise location where the problem happens and the
backtrace to see how it happened. This is much more information than,
pr_err without dump_stack.

> This one is slightly better because it prints "Memory cgroup
> charge failed because of no reclaimable memory! This looks like a
> misconfiguration or a kernel bug." before the warning. But still it
> says "or a kernel bug", which means that they will come to you.

Yeah, and that was the purpose of the thing.

> A much
> friendlier for user way to say this would be print a message at the
> point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
> misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
> trace (does not give any useful info for user). And return EINVAL if
> it can't fly at all? And then leave the "or a kernel bug" part for the
> WARNING each occurrence of which we do want to be reported to kernel
> developers.

But this is not applicable here. Your misconfiguration is quite obvious
because you simply set the hard limit to 0. This is not the only
situation when this can happen. There is no clear point to tell, you are
doing this wrong. If it was we would do it at that point obviously.

If you have a strong reason to believe that this is an abuse of WARN I
am all happy to change that. But I haven't heard any yet, to be honest.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:


Reported-and-tested-by:  
syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com


Tested on:

commit: 1ffaddd029c8 Linux 4.18-rc8
git tree:   upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=3bdb367561cb7285
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  https://syzkaller.appspot.com/x/patch.diff?x=146f983040

Note: testing is done by a robot and is best-effort only.


Re: WARNING in try_charge

2018-08-06 Thread syzbot

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger  
crash:


Reported-and-tested-by:  
syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com


Tested on:

commit: 1ffaddd029c8 Linux 4.18-rc8
git tree:   upstream
kernel config:  https://syzkaller.appspot.com/x/.config?x=3bdb367561cb7285
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
patch:  https://syzkaller.appspot.com/x/patch.diff?x=146f983040

Note: testing is done by a robot and is best-effort only.


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/06 19:39, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
>> Btw. running with the above diff on top might help us to ideantify
>> whether this is a pre-mature warning or a valid one. Still useful to
>> find out.

Since syzbot already found a syz reproducer, you can ask syzbot to test it.

> 
> The bug report has a reproducer, so you can run it with the patch. Or
> ask syzbot to test your patch:
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
> Which basically boils down to saying:
> 
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> master

Excuse me, but this is linux-next only problem. Therefore,

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
master

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..852cd3dbdcd9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
bool ret;
 
mutex_lock(_lock);
+   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
+   current->comm, current->pid, 
tsk_is_oom_victim(current));
ret = out_of_memory();
mutex_unlock(_lock);
return ret;

F.Y.I. Waiting until __mmput() completes (with timeout using OOM score feedback)
( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solves this 
race.


Re: WARNING in try_charge

2018-08-06 Thread Tetsuo Handa
On 2018/08/06 19:39, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
>> Btw. running with the above diff on top might help us to ideantify
>> whether this is a pre-mature warning or a valid one. Still useful to
>> find out.

Since syzbot already found a syz reproducer, you can ask syzbot to test it.

> 
> The bug report has a reproducer, so you can run it with the patch. Or
> ask syzbot to test your patch:
> https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
> Which basically boils down to saying:
> 
> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> master

Excuse me, but this is linux-next only problem. Therefore,

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
master

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..852cd3dbdcd9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup 
*memcg, gfp_t gfp_mask,
bool ret;
 
mutex_lock(_lock);
+   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
+   current->comm, current->pid, 
tsk_is_oom_victim(current));
ret = out_of_memory();
mutex_unlock(_lock);
return ret;

F.Y.I. Waiting until __mmput() completes (with timeout using OOM score feedback)
( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solves this 
race.


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
> On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote:
>> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
> [...]
>> > More interesting stuff is higher in the kernel log
>> > : [  366.435015] 
>> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
>> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>> >
>> > Are you sure you want to have hard limit set to 0?
>>
>> syzkaller really does not mind to have it.
>
> So what do you use it for? What do you actually test by this setting?
>
> [...]
>> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> > index 4603ad75c9a9..852cd3dbdcd9 100644
>> > --- a/mm/memcontrol.c
>> > +++ b/mm/memcontrol.c
>> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
>> > mem_cgroup *memcg, gfp_t gfp_mask,
>> > bool ret;
>> >
>> > mutex_lock(_lock);
>> > +   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
>> > +   current->comm, current->pid, 
>> > tsk_is_oom_victim(current));
>> > ret = out_of_memory();
>> > mutex_unlock(_lock);
>> > return ret;
>> >
>> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
>> > and basically no memory charged by existing tasks is not going to fly
>> > and the warning is exactly to call that out.
>>
>>
>> Please-please-please do not mix kernel bugs and notices to user into
>> the same bucket:
>
> Well, WARN_ON used to be a standard way to make user aware of a
> misbehavior. In this case it warns about a pottential runaway when memcg
> is misconfigured. I do not insist on using WARN_ON here of course. If
> there is a general agreement that such a condition is better handled by
> pr_err then I am fine with it. Users tend to be more sensitive on
> WARN_ONs though.
>
> Btw. running with the above diff on top might help us to ideantify
> whether this is a pre-mature warning or a valid one. Still useful to
> find out.

The bug report has a reproducer, so you can run it with the patch. Or
ask syzbot to test your patch:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
Which basically boils down to saying:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master

Note that a text patch without a base tree/commit can be useless, I
used torvalds/linux.git but I don't know if it will apply there or
not. Let's see.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..852cd3dbdcd9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	bool ret;
 
 	mutex_lock(_lock);
+	pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
+			current->comm, current->pid, tsk_is_oom_victim(current));
 	ret = out_of_memory();
 	mutex_unlock(_lock);
 	return ret;



Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
> On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote:
>> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
> [...]
>> > More interesting stuff is higher in the kernel log
>> > : [  366.435015] 
>> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
>> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>> >
>> > Are you sure you want to have hard limit set to 0?
>>
>> syzkaller really does not mind to have it.
>
> So what do you use it for? What do you actually test by this setting?
>
> [...]
>> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> > index 4603ad75c9a9..852cd3dbdcd9 100644
>> > --- a/mm/memcontrol.c
>> > +++ b/mm/memcontrol.c
>> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
>> > mem_cgroup *memcg, gfp_t gfp_mask,
>> > bool ret;
>> >
>> > mutex_lock(_lock);
>> > +   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
>> > +   current->comm, current->pid, 
>> > tsk_is_oom_victim(current));
>> > ret = out_of_memory();
>> > mutex_unlock(_lock);
>> > return ret;
>> >
>> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
>> > and basically no memory charged by existing tasks is not going to fly
>> > and the warning is exactly to call that out.
>>
>>
>> Please-please-please do not mix kernel bugs and notices to user into
>> the same bucket:
>
> Well, WARN_ON used to be a standard way to make user aware of a
> misbehavior. In this case it warns about a pottential runaway when memcg
> is misconfigured. I do not insist on using WARN_ON here of course. If
> there is a general agreement that such a condition is better handled by
> pr_err then I am fine with it. Users tend to be more sensitive on
> WARN_ONs though.
>
> Btw. running with the above diff on top might help us to ideantify
> whether this is a pre-mature warning or a valid one. Still useful to
> find out.

The bug report has a reproducer, so you can run it with the patch. Or
ask syzbot to test your patch:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
Which basically boils down to saying:

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
master

Note that a text patch without a base tree/commit can be useless, I
used torvalds/linux.git but I don't know if it will apply there or
not. Let's see.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 4603ad75c9a9..852cd3dbdcd9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	bool ret;
 
 	mutex_lock(_lock);
+	pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
+			current->comm, current->pid, tsk_is_oom_victim(current));
 	ret = out_of_memory();
 	mutex_unlock(_lock);
 	return ret;



Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
> On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote:
>> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
> [...]
>> > More interesting stuff is higher in the kernel log
>> > : [  366.435015] 
>> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
>> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>> >
>> > Are you sure you want to have hard limit set to 0?
>>
>> syzkaller really does not mind to have it.
>
> So what do you use it for? What do you actually test by this setting?

syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is
doable from user-space. Some of that may not make sense, but it does
not matter because kernel should still stand still.

> [...]
>> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> > index 4603ad75c9a9..852cd3dbdcd9 100644
>> > --- a/mm/memcontrol.c
>> > +++ b/mm/memcontrol.c
>> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
>> > mem_cgroup *memcg, gfp_t gfp_mask,
>> > bool ret;
>> >
>> > mutex_lock(_lock);
>> > +   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
>> > +   current->comm, current->pid, 
>> > tsk_is_oom_victim(current));
>> > ret = out_of_memory();
>> > mutex_unlock(_lock);
>> > return ret;
>> >
>> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
>> > and basically no memory charged by existing tasks is not going to fly
>> > and the warning is exactly to call that out.
>>
>>
>> Please-please-please do not mix kernel bugs and notices to user into
>> the same bucket:
>
> Well, WARN_ON used to be a standard way to make user aware of a
> misbehavior. In this case it warns about a pottential runaway when memcg
> is misconfigured. I do not insist on using WARN_ON here of course. If
> there is a general agreement that such a condition is better handled by
> pr_err then I am fine with it. Users tend to be more sensitive on
> WARN_ONs though.

The docs change was acked by Greg, and Andrew took it into mm, Linus
was CCed too. It missed the release because I guess it's comments only
change, but otherwise it should reach upstream tree on the next merge
window.

WARN is _not_ a common way to notify users today. syzbot reports _all_
WARN occurrences and you can see there are not many of them now
(probably 1 another now, +dtor for that one):
https://syzkaller.appspot.com#upstream
There is probably some long tail that we need to fix. We really do
want systematic testing capability. You do not want every of 2 billion
linux users to come to you with this kernel splat, just so that you
can explain to them that it's some programs of their machines doing
something wrong, right?

WARN is really a bad way to inform a user about something. Consider a
non-kernel developer, perhaps even non-programmer. What they see is
"WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710
try_charge+0x734/0x1680" followed by some obscure things and hex
numbers. File:line reference is pointless, they don't what what/where
it is. This one is slightly better because it prints "Memory cgroup
charge failed because of no reclaimable memory! This looks like a
misconfiguration or a kernel bug." before the warning. But still it
says "or a kernel bug", which means that they will come to you. A much
friendlier for user way to say this would be print a message at the
point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
trace (does not give any useful info for user). And return EINVAL if
it can't fly at all? And then leave the "or a kernel bug" part for the
WARNING each occurrence of which we do want to be reported to kernel
developers.


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 11:48 AM, Michal Hocko  wrote:
> On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote:
>> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
> [...]
>> > More interesting stuff is higher in the kernel log
>> > : [  366.435015] 
>> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
>> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>> >
>> > Are you sure you want to have hard limit set to 0?
>>
>> syzkaller really does not mind to have it.
>
> So what do you use it for? What do you actually test by this setting?

syzkaller is kernel fuzzer, it finds kernel bugs by doing whatever is
doable from user-space. Some of that may not make sense, but it does
not matter because kernel should still stand still.

> [...]
>> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> > index 4603ad75c9a9..852cd3dbdcd9 100644
>> > --- a/mm/memcontrol.c
>> > +++ b/mm/memcontrol.c
>> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
>> > mem_cgroup *memcg, gfp_t gfp_mask,
>> > bool ret;
>> >
>> > mutex_lock(_lock);
>> > +   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
>> > +   current->comm, current->pid, 
>> > tsk_is_oom_victim(current));
>> > ret = out_of_memory();
>> > mutex_unlock(_lock);
>> > return ret;
>> >
>> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
>> > and basically no memory charged by existing tasks is not going to fly
>> > and the warning is exactly to call that out.
>>
>>
>> Please-please-please do not mix kernel bugs and notices to user into
>> the same bucket:
>
> Well, WARN_ON used to be a standard way to make user aware of a
> misbehavior. In this case it warns about a pottential runaway when memcg
> is misconfigured. I do not insist on using WARN_ON here of course. If
> there is a general agreement that such a condition is better handled by
> pr_err then I am fine with it. Users tend to be more sensitive on
> WARN_ONs though.

The docs change was acked by Greg, and Andrew took it into mm, Linus
was CCed too. It missed the release because I guess it's comments only
change, but otherwise it should reach upstream tree on the next merge
window.

WARN is _not_ a common way to notify users today. syzbot reports _all_
WARN occurrences and you can see there are not many of them now
(probably 1 another now, +dtor for that one):
https://syzkaller.appspot.com#upstream
There is probably some long tail that we need to fix. We really do
want systematic testing capability. You do not want every of 2 billion
linux users to come to you with this kernel splat, just so that you
can explain to them that it's some programs of their machines doing
something wrong, right?

WARN is really a bad way to inform a user about something. Consider a
non-kernel developer, perhaps even non-programmer. What they see is
"WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710
try_charge+0x734/0x1680" followed by some obscure things and hex
numbers. File:line reference is pointless, they don't what what/where
it is. This one is slightly better because it prints "Memory cgroup
charge failed because of no reclaimable memory! This looks like a
misconfiguration or a kernel bug." before the warning. But still it
says "or a kernel bug", which means that they will come to you. A much
friendlier for user way to say this would be print a message at the
point of misconfiguration saying what exactly is wrong, e.g. "pid $PID
misconfigures cgroup /cgroup/path with mem.limit=0" without a stack
trace (does not give any useful info for user). And return EINVAL if
it can't fly at all? And then leave the "or a kernel bug" part for the
WARNING each occurrence of which we do want to be reported to kernel
developers.


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
[...]
> > More interesting stuff is higher in the kernel log
> > : [  366.435015] 
> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
> >
> > Are you sure you want to have hard limit set to 0?
> 
> syzkaller really does not mind to have it.

So what do you use it for? What do you actually test by this setting?

[...]
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 4603ad75c9a9..852cd3dbdcd9 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
> > mem_cgroup *memcg, gfp_t gfp_mask,
> > bool ret;
> >
> > mutex_lock(_lock);
> > +   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
> > +   current->comm, current->pid, 
> > tsk_is_oom_victim(current));
> > ret = out_of_memory();
> > mutex_unlock(_lock);
> > return ret;
> >
> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
> > and basically no memory charged by existing tasks is not going to fly
> > and the warning is exactly to call that out.
> 
> 
> Please-please-please do not mix kernel bugs and notices to user into
> the same bucket:

Well, WARN_ON used to be a standard way to make user aware of a
misbehavior. In this case it warns about a pottential runaway when memcg
is misconfigured. I do not insist on using WARN_ON here of course. If
there is a general agreement that such a condition is better handled by
pr_err then I am fine with it. Users tend to be more sensitive on
WARN_ONs though.

Btw. running with the above diff on top might help us to ideantify
whether this is a pre-mature warning or a valid one. Still useful to
find out.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Mon 06-08-18 11:30:37, Dmitry Vyukov wrote:
> On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
[...]
> > More interesting stuff is higher in the kernel log
> > : [  366.435015] 
> > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
> > : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
> >
> > Are you sure you want to have hard limit set to 0?
> 
> syzkaller really does not mind to have it.

So what do you use it for? What do you actually test by this setting?

[...]
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 4603ad75c9a9..852cd3dbdcd9 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1388,6 +1388,8 @@ static bool mem_cgroup_out_of_memory(struct 
> > mem_cgroup *memcg, gfp_t gfp_mask,
> > bool ret;
> >
> > mutex_lock(_lock);
> > +   pr_info("task=%s pid=%d invoked memcg oom killer. oom_victim=%d\n",
> > +   current->comm, current->pid, 
> > tsk_is_oom_victim(current));
> > ret = out_of_memory();
> > mutex_unlock(_lock);
> > return ret;
> >
> > Anyway your memcg setup is indeed misconfigured. Memcg with 0 hard limit
> > and basically no memory charged by existing tasks is not going to fly
> > and the warning is exactly to call that out.
> 
> 
> Please-please-please do not mix kernel bugs and notices to user into
> the same bucket:

Well, WARN_ON used to be a standard way to make user aware of a
misbehavior. In this case it warns about a pottential runaway when memcg
is misconfigured. I do not insist on using WARN_ON here of course. If
there is a general agreement that such a condition is better handled by
pr_err then I am fine with it. Users tend to be more sensitive on
WARN_ONs though.

Btw. running with the above diff on top might help us to ideantify
whether this is a pre-mature warning or a valid one. Still useful to
find out.
-- 
Michal Hocko
SUSE Labs


Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
> On Sat 04-08-18 06:33:02, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:d1e0b8e0cb7a Add linux-next specific files for 20180725
>> git tree:   linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c77040
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
>>
>> Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB,
>> file-rss:32768kB, shmem-rss:0kB
>> oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB,
>> file-rss:32000kB, shmem-rss:0kB
>
> More interesting stuff is higher in the kernel log
> : [  366.435015] 
> oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
> : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>
> Are you sure you want to have hard limit set to 0?

syzkaller really does not mind to have it.

> : [  366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
>
> There are only 3 pages charged to this memcg!
>
> : [  366.487490] Tasks state (memory values in pages):
> : [  366.492349] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [  366.501237] [  23766] 0 2376617620 8221   1269760
>  0 syz-executor3
> : [  366.510367] [  23767] 0 2376717618 8218   1269760
>  0 syz-executor2
> : [  366.519409] Memory cgroup out of memory: Kill process 23766 
> (syz-executor3) score 8252000 or sacrifice child
> : [  366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, 
> anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB
> : [  366.540456] oom_reaper: reaped process 23766 (syz-executor3), now 
> anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
>
> The oom reaper cannot reclaim file backed memory  from a large part. I
> assume this is are shared mappings which are living outside of memcg
> because of the counter.
>
> : [...]
> : [  367.085870] 
> oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0
> : [  367.100073] memory: usage 112kB, limit 0kB, failcnt 1615
> : [  367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
> : [  367.138136] Tasks state (memory values in pages):
> : [  367.142986] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [  367.151889] [  23766] 0 2376617620 8002   1269760
>  0 syz-executor3
> : [  367.160946] [  23767] 0 2376717618 8218   1269760
>  0 syz-executor2
> : [  367.169994] Memory cgroup out of memory: Kill process 23767 
> (syz-executor2) score 8249000 or sacrifice child
> : [  367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, 
> anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB
> : [  367.192101] oom_reaper: reaped process 23767 (syz-executor2), now 
> anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
> : [  367.202986] [ cut here ]
> : [  367.207845] Memory cgroup charge failed because of no reclaimable 
> memory! This looks like a misconfiguration or a kernel bug.
> : [  367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 
> try_charge+0x734/0x1680
> : [  367.227540] Kernel panic - not syncing: panic_on_warn set ...
>
> This is unexpected though. We have killed a task (23767) which is trying
> to charge the memory which means it should
> trigger the charge retry and that one should force the charge
>
> /*
>  * Unlike in global OOM situations, memcg is not in a physical
>  * memory shortage.  Allow dying and OOM-killed tasks to
>  * bypass the last charges so that they can exit quickly and
>  * free their memory.
>  */
> if (unlikely(tsk_is_oom_victim(current) ||
>  

Re: WARNING in try_charge

2018-08-06 Thread Dmitry Vyukov
On Mon, Aug 6, 2018 at 11:15 AM, Michal Hocko  wrote:
> On Sat 04-08-18 06:33:02, syzbot wrote:
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:d1e0b8e0cb7a Add linux-next specific files for 20180725
>> git tree:   linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c77040
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33
>> dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325
>> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
>>
>> Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB,
>> file-rss:32768kB, shmem-rss:0kB
>> oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB,
>> file-rss:32000kB, shmem-rss:0kB
>
> More interesting stuff is higher in the kernel log
> : [  366.435015] 
> oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
> : [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605
>
> Are you sure you want to have hard limit set to 0?

syzkaller really does not mind to have it.

> : [  366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
>
> There are only 3 pages charged to this memcg!
>
> : [  366.487490] Tasks state (memory values in pages):
> : [  366.492349] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [  366.501237] [  23766] 0 2376617620 8221   1269760
>  0 syz-executor3
> : [  366.510367] [  23767] 0 2376717618 8218   1269760
>  0 syz-executor2
> : [  366.519409] Memory cgroup out of memory: Kill process 23766 
> (syz-executor3) score 8252000 or sacrifice child
> : [  366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, 
> anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB
> : [  366.540456] oom_reaper: reaped process 23766 (syz-executor3), now 
> anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
>
> The oom reaper cannot reclaim file backed memory  from a large part. I
> assume this is are shared mappings which are living outside of memcg
> because of the counter.
>
> : [...]
> : [  367.085870] 
> oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0
> : [  367.100073] memory: usage 112kB, limit 0kB, failcnt 1615
> : [  367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
> : [  367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB 
> rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB 
> inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB 
> unevictable:0KB
> : [  367.138136] Tasks state (memory values in pages):
> : [  367.142986] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
> swapents oom_score_adj name
> : [  367.151889] [  23766] 0 2376617620 8002   1269760
>  0 syz-executor3
> : [  367.160946] [  23767] 0 2376717618 8218   1269760
>  0 syz-executor2
> : [  367.169994] Memory cgroup out of memory: Kill process 23767 
> (syz-executor2) score 8249000 or sacrifice child
> : [  367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, 
> anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB
> : [  367.192101] oom_reaper: reaped process 23767 (syz-executor2), now 
> anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB
> : [  367.202986] [ cut here ]
> : [  367.207845] Memory cgroup charge failed because of no reclaimable 
> memory! This looks like a misconfiguration or a kernel bug.
> : [  367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 
> try_charge+0x734/0x1680
> : [  367.227540] Kernel panic - not syncing: panic_on_warn set ...
>
> This is unexpected though. We have killed a task (23767) which is trying
> to charge the memory which means it should
> trigger the charge retry and that one should force the charge
>
> /*
>  * Unlike in global OOM situations, memcg is not in a physical
>  * memory shortage.  Allow dying and OOM-killed tasks to
>  * bypass the last charges so that they can exit quickly and
>  * free their memory.
>  */
> if (unlikely(tsk_is_oom_victim(current) ||
>  

Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Sat 04-08-18 06:33:02, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:d1e0b8e0cb7a Add linux-next specific files for 20180725
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c77040
> kernel config:  https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33
> dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
> 
> Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB,
> file-rss:32768kB, shmem-rss:0kB
> oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB,
> file-rss:32000kB, shmem-rss:0kB

More interesting stuff is higher in the kernel log
: [  366.435015] 
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
: [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605

Are you sure you want to have hard limit set to 0?

: [  366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
: [  366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
: [  366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB 
shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB

There are only 3 pages charged to this memcg!

: [  366.487490] Tasks state (memory values in pages):
: [  366.492349] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
swapents oom_score_adj name
: [  366.501237] [  23766] 0 2376617620 8221   1269760  
   0 syz-executor3
: [  366.510367] [  23767] 0 2376717618 8218   1269760  
   0 syz-executor2
: [  366.519409] Memory cgroup out of memory: Kill process 23766 
(syz-executor3) score 8252000 or sacrifice child
: [  366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, 
anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB
: [  366.540456] oom_reaper: reaped process 23766 (syz-executor3), now 
anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB

The oom reaper cannot reclaim file backed memory  from a large part. I
assume this is are shared mappings which are living outside of memcg
because of the counter.

: [...]
: [  367.085870] 
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0
: [  367.100073] memory: usage 112kB, limit 0kB, failcnt 1615
: [  367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
: [  367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
: [  367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB 
shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
: [  367.138136] Tasks state (memory values in pages):
: [  367.142986] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
swapents oom_score_adj name
: [  367.151889] [  23766] 0 2376617620 8002   1269760  
   0 syz-executor3
: [  367.160946] [  23767] 0 2376717618 8218   1269760  
   0 syz-executor2
: [  367.169994] Memory cgroup out of memory: Kill process 23767 
(syz-executor2) score 8249000 or sacrifice child
: [  367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, 
anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB
: [  367.192101] oom_reaper: reaped process 23767 (syz-executor2), now 
anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB 
: [  367.202986] [ cut here ]
: [  367.207845] Memory cgroup charge failed because of no reclaimable memory! 
This looks like a misconfiguration or a kernel bug.
: [  367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 
try_charge+0x734/0x1680
: [  367.227540] Kernel panic - not syncing: panic_on_warn set ...

This is unexpected though. We have killed a task (23767) which is trying
to charge the memory which means it should
trigger the charge retry and that one should force the charge

/*
 * Unlike in global OOM situations, memcg is not in a physical
 * memory shortage.  Allow dying and OOM-killed tasks to
 * bypass the last charges so that they can exit quickly and
 * free their memory.
 */
if (unlikely(tsk_is_oom_victim(current) ||
 fatal_signal_pending(current) ||
 current->flags & PF_EXITING))
goto force;

There doesn't seem to be any other sign of OOM killer invocation which
could then indeed lead to the warning as there is no other task to kill
(both 

Re: WARNING in try_charge

2018-08-06 Thread Michal Hocko
On Sat 04-08-18 06:33:02, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:d1e0b8e0cb7a Add linux-next specific files for 20180725
> git tree:   linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=15a1c77040
> kernel config:  https://syzkaller.appspot.com/x/.config?x=eef3552c897e4d33
> dashboard link: https://syzkaller.appspot.com/bug?extid=bab151e82a4e973fa325
> compiler:   gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+bab151e82a4e973fa...@syzkaller.appspotmail.com
> 
> Killed process 23767 (syz-executor2) total-vm:70472kB, anon-rss:104kB,
> file-rss:32768kB, shmem-rss:0kB
> oom_reaper: reaped process 23767 (syz-executor2), now anon-rss:0kB,
> file-rss:32000kB, shmem-rss:0kB

More interesting stuff is higher in the kernel log
: [  366.435015] 
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor3,pid=23766,uid=0
: [  366.449416] memory: usage 112kB, limit 0kB, failcnt 1605

Are you sure you want to have hard limit set to 0?

: [  366.454963] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
: [  366.461787] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
: [  366.467946] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB 
shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB

There are only 3 pages charged to this memcg!

: [  366.487490] Tasks state (memory values in pages):
: [  366.492349] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
swapents oom_score_adj name
: [  366.501237] [  23766] 0 2376617620 8221   1269760  
   0 syz-executor3
: [  366.510367] [  23767] 0 2376717618 8218   1269760  
   0 syz-executor2
: [  366.519409] Memory cgroup out of memory: Kill process 23766 
(syz-executor3) score 8252000 or sacrifice child
: [  366.529422] Killed process 23766 (syz-executor3) total-vm:70480kB, 
anon-rss:116kB, file-rss:32768kB, shmem-rss:0kB
: [  366.540456] oom_reaper: reaped process 23766 (syz-executor3), now 
anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB

The oom reaper cannot reclaim file backed memory  from a large part. I
assume this is are shared mappings which are living outside of memcg
because of the counter.

: [...]
: [  367.085870] 
oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/ile0,task_memcg=/ile0,task=syz-executor2,pid=23767,uid=0
: [  367.100073] memory: usage 112kB, limit 0kB, failcnt 1615
: [  367.105549] memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
: [  367.112428] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
: [  367.118593] Memory cgroup stats for /ile0: cache:12KB rss:0KB rss_huge:0KB 
shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB 
active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
: [  367.138136] Tasks state (memory values in pages):
: [  367.142986] [  pid  ]   uid  tgid total_vm  rss pgtables_bytes 
swapents oom_score_adj name
: [  367.151889] [  23766] 0 2376617620 8002   1269760  
   0 syz-executor3
: [  367.160946] [  23767] 0 2376717618 8218   1269760  
   0 syz-executor2
: [  367.169994] Memory cgroup out of memory: Kill process 23767 
(syz-executor2) score 8249000 or sacrifice child
: [  367.180119] Killed process 23767 (syz-executor2) total-vm:70472kB, 
anon-rss:104kB, file-rss:32768kB, shmem-rss:0kB
: [  367.192101] oom_reaper: reaped process 23767 (syz-executor2), now 
anon-rss:0kB, file-rss:32000kB, shmem-rss:0kB 
: [  367.202986] [ cut here ]
: [  367.207845] Memory cgroup charge failed because of no reclaimable memory! 
This looks like a misconfiguration or a kernel bug.
: [  367.207965] WARNING: CPU: 1 PID: 23767 at mm/memcontrol.c:1710 
try_charge+0x734/0x1680
: [  367.227540] Kernel panic - not syncing: panic_on_warn set ...

This is unexpected though. We have killed a task (23767) which is trying
to charge the memory which means it should
trigger the charge retry and that one should force the charge

/*
 * Unlike in global OOM situations, memcg is not in a physical
 * memory shortage.  Allow dying and OOM-killed tasks to
 * bypass the last charges so that they can exit quickly and
 * free their memory.
 */
if (unlikely(tsk_is_oom_victim(current) ||
 fatal_signal_pending(current) ||
 current->flags & PF_EXITING))
goto force;

There doesn't seem to be any other sign of OOM killer invocation which
could then indeed lead to the warning as there is no other task to kill
(both 

Re: WARNING in try_charge

2018-08-05 Thread Tetsuo Handa
On 2018/08/04 22:45, Tetsuo Handa wrote:
> syzbot is hitting WARN(1) because of mem_cgroup_out_of_memory() == false.

Since syzbot found a syz reproducer, I asked syzbot to try two patches.

Setting MMF_OOM_SKIP under oom_lock to prevent from races
( https://syzkaller.appspot.com/x/patch.diff?x=10fb3fd040 ) was not 
sufficient.

Waiting until __mmput() completes (with timeout using OOM score feedback)
( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solved this 
race.


Re: WARNING in try_charge

2018-08-05 Thread Tetsuo Handa
On 2018/08/04 22:45, Tetsuo Handa wrote:
> syzbot is hitting WARN(1) because of mem_cgroup_out_of_memory() == false.

Since syzbot found a syz reproducer, I asked syzbot to try two patches.

Setting MMF_OOM_SKIP under oom_lock to prevent from races
( https://syzkaller.appspot.com/x/patch.diff?x=10fb3fd040 ) was not 
sufficient.

Waiting until __mmput() completes (with timeout using OOM score feedback)
( https://syzkaller.appspot.com/x/patch.diff?x=101e449c40 ) solved this 
race.


  1   2   >