Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter

2017-07-27 Thread Michal Hocko
On Thu 27-07-17 11:30:50, Wenwei Tao wrote:
> 2017-07-26 21:44 GMT+08:00 Michal Hocko :
> > On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
[...]
> >> I think there is a css double put in mem_cgroup_iter. Under reclaim,
> >> we call mem_cgroup_iter the first time with prev == NULL, and we get
> >> last_visited memcg from per zone's reclaim_iter then call 
> >> __mem_cgroup_iter_next
> >> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
> >> if last_visited is already the last one so we put the last_visited's
> >> memcg css and continue to the next while loop, this time we might not
> >> do css_tryget(_visited->css) if the dead_count is changed, but
> >> we still do css_put(_visited->css), we put it twice, this could
> >> trigger the BUG_ON at kernel/cgroup.c:893.
> >
> > Yes, I guess your are right and I suspect that this has been silently
> > fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
> > loading and updating"). I think a more appropriate fix is would be.
> > Are you able to reproduce and re-test it?
> > ---
> 
> Yes, I think this commit can fix this issue, and I backport this
> commit to 3.10.107 kernel and cannot reproduce this issue. I guess
> this commit might need to be backported to 3.10.y stable kernel.

Please send it to the kernel-stable mailing list. 3.10 seems to be still
maintained.

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter

2017-07-27 Thread Michal Hocko
On Thu 27-07-17 11:30:50, Wenwei Tao wrote:
> 2017-07-26 21:44 GMT+08:00 Michal Hocko :
> > On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
[...]
> >> I think there is a css double put in mem_cgroup_iter. Under reclaim,
> >> we call mem_cgroup_iter the first time with prev == NULL, and we get
> >> last_visited memcg from per zone's reclaim_iter then call 
> >> __mem_cgroup_iter_next
> >> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
> >> if last_visited is already the last one so we put the last_visited's
> >> memcg css and continue to the next while loop, this time we might not
> >> do css_tryget(_visited->css) if the dead_count is changed, but
> >> we still do css_put(_visited->css), we put it twice, this could
> >> trigger the BUG_ON at kernel/cgroup.c:893.
> >
> > Yes, I guess your are right and I suspect that this has been silently
> > fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
> > loading and updating"). I think a more appropriate fix is would be.
> > Are you able to reproduce and re-test it?
> > ---
> 
> Yes, I think this commit can fix this issue, and I backport this
> commit to 3.10.107 kernel and cannot reproduce this issue. I guess
> this commit might need to be backported to 3.10.y stable kernel.

Please send it to the kernel-stable mailing list. 3.10 seems to be still
maintained.

-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter

2017-07-26 Thread Wenwei Tao
2017-07-26 21:44 GMT+08:00 Michal Hocko :
> On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
>> From: Wenwei Tao 
>>
>> By removing the child cgroup while the parent cgroup is
>> under reclaim, we could trigger the following kernel panic
>> on kernel 3.10:
>> 
>> kernel BUG at kernel/cgroup.c:893!
>>  invalid opcode:  [#1] SMP
>>  CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
>>  Workqueue: cgroup_destroy css_dput_fn
>>  task: 8817959a5780 ti: 8817e8886000 task.ti: 8817e8886000
>>  RIP: 0010:[]  []
>> cgroup_diput+0xc0/0xf0
>>  RSP: :8817e8887da0  EFLAGS: 00010246
>>  RAX:  RBX: 8817a5dd5d40 RCX: dead0200
>>  RDX:  RSI: 8817973a6910 RDI: 8817f54c2a00
>>  RBP: 8817e8887dc8 R08: 8817a5dd5dd0 R09: df9fb35794b01820
>>  R10: df9fb35794b01820 R11: 7fa95b1efcda R12: 8817a5dd5d9c
>>  R13: 8817f38b3a40 R14: 8817973a6910 R15: 8817973a6910
>>  FS:  () GS:88181f22()
>> knlGS:
>>  CS:  0010 DS:  ES:  CR0: 80050033
>>  CR2: 7fa6e6234000 CR3: 00179f19d000 CR4: 000407e0
>>  DR0:  DR1:  DR2: 
>>  DR3:  DR6: 0ff0 DR7: 0400
>>  Stack:
>>   8817a5dd5d40 8817a5dd5d9c 8817f38b3a40 8817973a6910
>>   0040 8817e8887df8 811b37c2 8817fa23c000
>>   8817f57dbb80 88181f232ac0 88181f237500 8817e8887e10
>>  Call Trace:
>>   [] dput+0x1a2/0x2f0
>>   [] cgroup_dput.isra.21+0x1c/0x30
>>   [] css_dput_fn+0x1d/0x20
>>   [] process_one_work+0x17c/0x460
>>   [] worker_thread+0x116/0x3b0
>>   [] ? manage_workers.isra.25+0x290/0x290
>>   [] kthread+0xc0/0xd0
>>   [] ? insert_kthread_work+0x40/0x40
>>   [] ret_from_fork+0x58/0x90
>>   [] ? insert_kthread_work+0x40/0x40
>>  Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
>> 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
>> 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
>>  RIP  [] cgroup_diput+0xc0/0xf0
>>  RSP 
>>  ---[ end trace 85eeea5212c44f51 ]---
>> 
>>
>> I think there is a css double put in mem_cgroup_iter. Under reclaim,
>> we call mem_cgroup_iter the first time with prev == NULL, and we get
>> last_visited memcg from per zone's reclaim_iter then call 
>> __mem_cgroup_iter_next
>> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
>> if last_visited is already the last one so we put the last_visited's
>> memcg css and continue to the next while loop, this time we might not
>> do css_tryget(_visited->css) if the dead_count is changed, but
>> we still do css_put(_visited->css), we put it twice, this could
>> trigger the BUG_ON at kernel/cgroup.c:893.
>
> Yes, I guess your are right and I suspect that this has been silently
> fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
> loading and updating"). I think a more appropriate fix is would be.
> Are you able to reproduce and re-test it?
> ---

Yes, I think this commit can fix this issue, and I backport this
commit to 3.10.107 kernel
and cannot reproduce this issue. I guess this commit might need to be
backported to 3.10.y
stable kernel.

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 437ae2cbe102..0848ec05c12a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1224,6 +1224,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup 
> *root,
> if (last_visited && last_visited != root &&
> !css_tryget(_visited->css))
> last_visited = NULL;
> +   } else {
> +   last_visited = true;
> }
> }
>
> --
> Michal Hocko
> SUSE Labs


Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter

2017-07-26 Thread Wenwei Tao
2017-07-26 21:44 GMT+08:00 Michal Hocko :
> On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
>> From: Wenwei Tao 
>>
>> By removing the child cgroup while the parent cgroup is
>> under reclaim, we could trigger the following kernel panic
>> on kernel 3.10:
>> 
>> kernel BUG at kernel/cgroup.c:893!
>>  invalid opcode:  [#1] SMP
>>  CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
>>  Workqueue: cgroup_destroy css_dput_fn
>>  task: 8817959a5780 ti: 8817e8886000 task.ti: 8817e8886000
>>  RIP: 0010:[]  []
>> cgroup_diput+0xc0/0xf0
>>  RSP: :8817e8887da0  EFLAGS: 00010246
>>  RAX:  RBX: 8817a5dd5d40 RCX: dead0200
>>  RDX:  RSI: 8817973a6910 RDI: 8817f54c2a00
>>  RBP: 8817e8887dc8 R08: 8817a5dd5dd0 R09: df9fb35794b01820
>>  R10: df9fb35794b01820 R11: 7fa95b1efcda R12: 8817a5dd5d9c
>>  R13: 8817f38b3a40 R14: 8817973a6910 R15: 8817973a6910
>>  FS:  () GS:88181f22()
>> knlGS:
>>  CS:  0010 DS:  ES:  CR0: 80050033
>>  CR2: 7fa6e6234000 CR3: 00179f19d000 CR4: 000407e0
>>  DR0:  DR1:  DR2: 
>>  DR3:  DR6: 0ff0 DR7: 0400
>>  Stack:
>>   8817a5dd5d40 8817a5dd5d9c 8817f38b3a40 8817973a6910
>>   0040 8817e8887df8 811b37c2 8817fa23c000
>>   8817f57dbb80 88181f232ac0 88181f237500 8817e8887e10
>>  Call Trace:
>>   [] dput+0x1a2/0x2f0
>>   [] cgroup_dput.isra.21+0x1c/0x30
>>   [] css_dput_fn+0x1d/0x20
>>   [] process_one_work+0x17c/0x460
>>   [] worker_thread+0x116/0x3b0
>>   [] ? manage_workers.isra.25+0x290/0x290
>>   [] kthread+0xc0/0xd0
>>   [] ? insert_kthread_work+0x40/0x40
>>   [] ret_from_fork+0x58/0x90
>>   [] ? insert_kthread_work+0x40/0x40
>>  Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
>> 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
>> 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
>>  RIP  [] cgroup_diput+0xc0/0xf0
>>  RSP 
>>  ---[ end trace 85eeea5212c44f51 ]---
>> 
>>
>> I think there is a css double put in mem_cgroup_iter. Under reclaim,
>> we call mem_cgroup_iter the first time with prev == NULL, and we get
>> last_visited memcg from per zone's reclaim_iter then call 
>> __mem_cgroup_iter_next
>> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
>> if last_visited is already the last one so we put the last_visited's
>> memcg css and continue to the next while loop, this time we might not
>> do css_tryget(_visited->css) if the dead_count is changed, but
>> we still do css_put(_visited->css), we put it twice, this could
>> trigger the BUG_ON at kernel/cgroup.c:893.
>
> Yes, I guess your are right and I suspect that this has been silently
> fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
> loading and updating"). I think a more appropriate fix is would be.
> Are you able to reproduce and re-test it?
> ---

Yes, I think this commit can fix this issue, and I backport this
commit to 3.10.107 kernel
and cannot reproduce this issue. I guess this commit might need to be
backported to 3.10.y
stable kernel.

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 437ae2cbe102..0848ec05c12a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1224,6 +1224,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup 
> *root,
> if (last_visited && last_visited != root &&
> !css_tryget(_visited->css))
> last_visited = NULL;
> +   } else {
> +   last_visited = true;
> }
> }
>
> --
> Michal Hocko
> SUSE Labs


Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter

2017-07-26 Thread Michal Hocko
On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
> From: Wenwei Tao 
> 
> By removing the child cgroup while the parent cgroup is
> under reclaim, we could trigger the following kernel panic
> on kernel 3.10:
> 
> kernel BUG at kernel/cgroup.c:893!
>  invalid opcode:  [#1] SMP
>  CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
>  Workqueue: cgroup_destroy css_dput_fn
>  task: 8817959a5780 ti: 8817e8886000 task.ti: 8817e8886000
>  RIP: 0010:[]  []
> cgroup_diput+0xc0/0xf0
>  RSP: :8817e8887da0  EFLAGS: 00010246
>  RAX:  RBX: 8817a5dd5d40 RCX: dead0200
>  RDX:  RSI: 8817973a6910 RDI: 8817f54c2a00
>  RBP: 8817e8887dc8 R08: 8817a5dd5dd0 R09: df9fb35794b01820
>  R10: df9fb35794b01820 R11: 7fa95b1efcda R12: 8817a5dd5d9c
>  R13: 8817f38b3a40 R14: 8817973a6910 R15: 8817973a6910
>  FS:  () GS:88181f22()
> knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 7fa6e6234000 CR3: 00179f19d000 CR4: 000407e0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: 0ff0 DR7: 0400
>  Stack:
>   8817a5dd5d40 8817a5dd5d9c 8817f38b3a40 8817973a6910
>   0040 8817e8887df8 811b37c2 8817fa23c000
>   8817f57dbb80 88181f232ac0 88181f237500 8817e8887e10
>  Call Trace:
>   [] dput+0x1a2/0x2f0
>   [] cgroup_dput.isra.21+0x1c/0x30
>   [] css_dput_fn+0x1d/0x20
>   [] process_one_work+0x17c/0x460
>   [] worker_thread+0x116/0x3b0
>   [] ? manage_workers.isra.25+0x290/0x290
>   [] kthread+0xc0/0xd0
>   [] ? insert_kthread_work+0x40/0x40
>   [] ret_from_fork+0x58/0x90
>   [] ? insert_kthread_work+0x40/0x40
>  Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
> 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
> 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
>  RIP  [] cgroup_diput+0xc0/0xf0
>  RSP 
>  ---[ end trace 85eeea5212c44f51 ]---
> 
> 
> I think there is a css double put in mem_cgroup_iter. Under reclaim,
> we call mem_cgroup_iter the first time with prev == NULL, and we get
> last_visited memcg from per zone's reclaim_iter then call 
> __mem_cgroup_iter_next
> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
> if last_visited is already the last one so we put the last_visited's
> memcg css and continue to the next while loop, this time we might not
> do css_tryget(_visited->css) if the dead_count is changed, but
> we still do css_put(_visited->css), we put it twice, this could
> trigger the BUG_ON at kernel/cgroup.c:893.

Yes, I guess your are right and I suspect that this has been silently
fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
loading and updating"). I think a more appropriate fix is would be.
Are you able to reproduce and re-test it?
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 437ae2cbe102..0848ec05c12a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1224,6 +1224,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup 
*root,
if (last_visited && last_visited != root &&
!css_tryget(_visited->css))
last_visited = NULL;
+   } else {
+   last_visited = true;
}
}
 
-- 
Michal Hocko
SUSE Labs


Re: [RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter

2017-07-26 Thread Michal Hocko
On Wed 26-07-17 21:07:42, Wenwei Tao wrote:
> From: Wenwei Tao 
> 
> By removing the child cgroup while the parent cgroup is
> under reclaim, we could trigger the following kernel panic
> on kernel 3.10:
> 
> kernel BUG at kernel/cgroup.c:893!
>  invalid opcode:  [#1] SMP
>  CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
>  Workqueue: cgroup_destroy css_dput_fn
>  task: 8817959a5780 ti: 8817e8886000 task.ti: 8817e8886000
>  RIP: 0010:[]  []
> cgroup_diput+0xc0/0xf0
>  RSP: :8817e8887da0  EFLAGS: 00010246
>  RAX:  RBX: 8817a5dd5d40 RCX: dead0200
>  RDX:  RSI: 8817973a6910 RDI: 8817f54c2a00
>  RBP: 8817e8887dc8 R08: 8817a5dd5dd0 R09: df9fb35794b01820
>  R10: df9fb35794b01820 R11: 7fa95b1efcda R12: 8817a5dd5d9c
>  R13: 8817f38b3a40 R14: 8817973a6910 R15: 8817973a6910
>  FS:  () GS:88181f22()
> knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 7fa6e6234000 CR3: 00179f19d000 CR4: 000407e0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: 0ff0 DR7: 0400
>  Stack:
>   8817a5dd5d40 8817a5dd5d9c 8817f38b3a40 8817973a6910
>   0040 8817e8887df8 811b37c2 8817fa23c000
>   8817f57dbb80 88181f232ac0 88181f237500 8817e8887e10
>  Call Trace:
>   [] dput+0x1a2/0x2f0
>   [] cgroup_dput.isra.21+0x1c/0x30
>   [] css_dput_fn+0x1d/0x20
>   [] process_one_work+0x17c/0x460
>   [] worker_thread+0x116/0x3b0
>   [] ? manage_workers.isra.25+0x290/0x290
>   [] kthread+0xc0/0xd0
>   [] ? insert_kthread_work+0x40/0x40
>   [] ret_from_fork+0x58/0x90
>   [] ? insert_kthread_work+0x40/0x40
>  Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
> 48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
> 49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
>  RIP  [] cgroup_diput+0xc0/0xf0
>  RSP 
>  ---[ end trace 85eeea5212c44f51 ]---
> 
> 
> I think there is a css double put in mem_cgroup_iter. Under reclaim,
> we call mem_cgroup_iter the first time with prev == NULL, and we get
> last_visited memcg from per zone's reclaim_iter then call 
> __mem_cgroup_iter_next
> try to get next alive memcg, __mem_cgroup_iter_next could return NULL
> if last_visited is already the last one so we put the last_visited's
> memcg css and continue to the next while loop, this time we might not
> do css_tryget(_visited->css) if the dead_count is changed, but
> we still do css_put(_visited->css), we put it twice, this could
> trigger the BUG_ON at kernel/cgroup.c:893.

Yes, I guess your are right and I suspect that this has been silently
fixed by 519ebea3bf6d ("mm: memcontrol: factor out reclaim iterator
loading and updating"). I think a more appropriate fix is would be.
Are you able to reproduce and re-test it?
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 437ae2cbe102..0848ec05c12a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1224,6 +1224,8 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup 
*root,
if (last_visited && last_visited != root &&
!css_tryget(_visited->css))
last_visited = NULL;
+   } else {
+   last_visited = true;
}
}
 
-- 
Michal Hocko
SUSE Labs


[RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter

2017-07-26 Thread Wenwei Tao
From: Wenwei Tao 

By removing the child cgroup while the parent cgroup is
under reclaim, we could trigger the following kernel panic
on kernel 3.10:

kernel BUG at kernel/cgroup.c:893!
 invalid opcode:  [#1] SMP
 CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
 Workqueue: cgroup_destroy css_dput_fn
 task: 8817959a5780 ti: 8817e8886000 task.ti: 8817e8886000
 RIP: 0010:[]  []
cgroup_diput+0xc0/0xf0
 RSP: :8817e8887da0  EFLAGS: 00010246
 RAX:  RBX: 8817a5dd5d40 RCX: dead0200
 RDX:  RSI: 8817973a6910 RDI: 8817f54c2a00
 RBP: 8817e8887dc8 R08: 8817a5dd5dd0 R09: df9fb35794b01820
 R10: df9fb35794b01820 R11: 7fa95b1efcda R12: 8817a5dd5d9c
 R13: 8817f38b3a40 R14: 8817973a6910 R15: 8817973a6910
 FS:  () GS:88181f22()
knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7fa6e6234000 CR3: 00179f19d000 CR4: 000407e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Stack:
  8817a5dd5d40 8817a5dd5d9c 8817f38b3a40 8817973a6910
  0040 8817e8887df8 811b37c2 8817fa23c000
  8817f57dbb80 88181f232ac0 88181f237500 8817e8887e10
 Call Trace:
  [] dput+0x1a2/0x2f0
  [] cgroup_dput.isra.21+0x1c/0x30
  [] css_dput_fn+0x1d/0x20
  [] process_one_work+0x17c/0x460
  [] worker_thread+0x116/0x3b0
  [] ? manage_workers.isra.25+0x290/0x290
  [] kthread+0xc0/0xd0
  [] ? insert_kthread_work+0x40/0x40
  [] ret_from_fork+0x58/0x90
  [] ? insert_kthread_work+0x40/0x40
 Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
 RIP  [] cgroup_diput+0xc0/0xf0
 RSP 
 ---[ end trace 85eeea5212c44f51 ]---


I think there is a css double put in mem_cgroup_iter. Under reclaim,
we call mem_cgroup_iter the first time with prev == NULL, and we get
last_visited memcg from per zone's reclaim_iter then call __mem_cgroup_iter_next
try to get next alive memcg, __mem_cgroup_iter_next could return NULL
if last_visited is already the last one so we put the last_visited's
memcg css and continue to the next while loop, this time we might not
do css_tryget(_visited->css) if the dead_count is changed, but
we still do css_put(_visited->css), we put it twice, this could
trigger the BUG_ON at kernel/cgroup.c:893.

Reported-by: Wang Yu 
Tested-by: Wang Yu 
Signed-off-by: Wenwei Tao 
---
 mm/memcontrol.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 437ae2c..3d7a046 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1230,8 +1230,10 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup 
*root,
memcg = __mem_cgroup_iter_next(root, last_visited);
 
if (reclaim) {
-   if (last_visited && last_visited != root)
+   if (last_visited && last_visited != root) {
css_put(_visited->css);
+   last_visited = NULL;
+   }
 
iter->last_visited = memcg;
smp_wmb();
-- 
1.8.3.1



[RFC PATCH] mm: memcg: fix css double put in mem_cgroup_iter

2017-07-26 Thread Wenwei Tao
From: Wenwei Tao 

By removing the child cgroup while the parent cgroup is
under reclaim, we could trigger the following kernel panic
on kernel 3.10:

kernel BUG at kernel/cgroup.c:893!
 invalid opcode:  [#1] SMP
 CPU: 1 PID: 22477 Comm: kworker/1:1 Not tainted 3.10.107 #1
 Workqueue: cgroup_destroy css_dput_fn
 task: 8817959a5780 ti: 8817e8886000 task.ti: 8817e8886000
 RIP: 0010:[]  []
cgroup_diput+0xc0/0xf0
 RSP: :8817e8887da0  EFLAGS: 00010246
 RAX:  RBX: 8817a5dd5d40 RCX: dead0200
 RDX:  RSI: 8817973a6910 RDI: 8817f54c2a00
 RBP: 8817e8887dc8 R08: 8817a5dd5dd0 R09: df9fb35794b01820
 R10: df9fb35794b01820 R11: 7fa95b1efcda R12: 8817a5dd5d9c
 R13: 8817f38b3a40 R14: 8817973a6910 R15: 8817973a6910
 FS:  () GS:88181f22()
knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7fa6e6234000 CR3: 00179f19d000 CR4: 000407e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Stack:
  8817a5dd5d40 8817a5dd5d9c 8817f38b3a40 8817973a6910
  0040 8817e8887df8 811b37c2 8817fa23c000
  8817f57dbb80 88181f232ac0 88181f237500 8817e8887e10
 Call Trace:
  [] dput+0x1a2/0x2f0
  [] cgroup_dput.isra.21+0x1c/0x30
  [] css_dput_fn+0x1d/0x20
  [] process_one_work+0x17c/0x460
  [] worker_thread+0x116/0x3b0
  [] ? manage_workers.isra.25+0x290/0x290
  [] kthread+0xc0/0xd0
  [] ? insert_kthread_work+0x40/0x40
  [] ret_from_fork+0x58/0x90
  [] ? insert_kthread_work+0x40/0x40
 Code: 41 5e 41 5f 5d c3 0f 1f 44 00 00 48 8b 7f 78 48 8b 07 a8 01 74 15
48 81 c7 30 01 00 00 48 c7 c6 a0 a7 0c 81 e8 b2 83 02 00 eb c8 <0f> 0b
49 8b 4e 18 48 c7 c2 7e f1 7a 81 be 85 03 00 00 48 c7 c7
 RIP  [] cgroup_diput+0xc0/0xf0
 RSP 
 ---[ end trace 85eeea5212c44f51 ]---


I think there is a css double put in mem_cgroup_iter. Under reclaim,
we call mem_cgroup_iter the first time with prev == NULL, and we get
last_visited memcg from per zone's reclaim_iter then call __mem_cgroup_iter_next
try to get next alive memcg, __mem_cgroup_iter_next could return NULL
if last_visited is already the last one so we put the last_visited's
memcg css and continue to the next while loop, this time we might not
do css_tryget(_visited->css) if the dead_count is changed, but
we still do css_put(_visited->css), we put it twice, this could
trigger the BUG_ON at kernel/cgroup.c:893.

Reported-by: Wang Yu 
Tested-by: Wang Yu 
Signed-off-by: Wenwei Tao 
---
 mm/memcontrol.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 437ae2c..3d7a046 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1230,8 +1230,10 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup 
*root,
memcg = __mem_cgroup_iter_next(root, last_visited);
 
if (reclaim) {
-   if (last_visited && last_visited != root)
+   if (last_visited && last_visited != root) {
css_put(_visited->css);
+   last_visited = NULL;
+   }
 
iter->last_visited = memcg;
smp_wmb();
-- 
1.8.3.1