Re: [PATCH] mm: memcg/slab: properly handle kmem_caches reparented to root_mem_cgroup

2019-06-20 Thread Roman Gushchin
On Thu, Jun 20, 2019 at 08:48:00AM -0700, Shakeel Butt wrote:
> On Wed, Jun 19, 2019 at 6:57 PM Roman Gushchin  wrote:
> >
> > As a result of reparenting a kmem_cache might belong to the root
> > memory cgroup. It happens when a top-level memory cgroup is removed,
> > and all associated kmem_caches are reparented to the root memory
> > cgroup.
> >
> > The root memory cgroup is special, and requires a special handling.
> > Let's make sure that we don't try to charge or uncharge it,
> > and we handle system-wide vmstats exactly as for root kmem_caches.
> >
> > Note, that we still need to alter the kmem_cache reference counter,
> > so that the kmem_cache can be released properly.
> >
> > The issue was discovered by running CRIU tests; the following warning
> > did appear:
> >
> > [  381.345960] WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62
> > page_counter_cancel+0x26/0x30
> > [  381.345992] Modules linked in:
> > [  381.345998] CPU: 0 PID: 11655 Comm: kworker/0:8 Not tainted
> > 5.2.0-rc5-next-20190618+ #1
> > [  381.346001] Hardware name: Google Google Compute Engine/Google
> > Compute Engine, BIOS Google 01/01/2011
> > [  381.346010] Workqueue: memcg_kmem_cache kmemcg_workfn
> > [  381.346013] RIP: 0010:page_counter_cancel+0x26/0x30
> > [  381.346017] Code: 1f 44 00 00 0f 1f 44 00 00 48 89 f0 53 48 f7 d8
> > f0 48 0f c1 07 48 29 f0 48 89 c3 48 89 c6 e8 61 ff ff ff 48 85 db 78
> > 02 5b c3 <0f> 0b 5b c3 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 ff 74 41
> > 41 55
> > [  381.346019] RSP: 0018:b3b34319f990 EFLAGS: 00010086
> > [  381.346022] RAX: fffc RBX: fffc RCX: 
> > 0004
> > [  381.346024] RDX:  RSI: fffc RDI: 
> > 9c2cd7165270
> > [  381.346026] RBP: 0004 R08:  R09: 
> > 0001
> > [  381.346028] R10: 00c8 R11: 9c2cd684e660 R12: 
> > fffc
> > [  381.346030] R13: 0002 R14: 0006 R15: 
> > 9c2c8ce1f200
> > [  381.346033] FS:  () GS:9c2cd820()
> > knlGS:
> > [  381.346039] CS:  0010 DS:  ES:  CR0: 80050033
> > [  381.346041] CR2: 007be000 CR3: 0001cdbfc005 CR4: 
> > 001606f0
> > [  381.346043] DR0:  DR1:  DR2: 
> > 
> > [  381.346045] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [  381.346047] Call Trace:
> > [  381.346054]  page_counter_uncharge+0x1d/0x30
> > [  381.346065]  __memcg_kmem_uncharge_memcg+0x39/0x60
> > [  381.346071]  __free_slab+0x34c/0x460
> > [  381.346079]  deactivate_slab.isra.80+0x57d/0x6d0
> > [  381.346088]  ? add_lock_to_list.isra.36+0x9c/0xf0
> > [  381.346095]  ? __lock_acquire+0x252/0x1410
> > [  381.346106]  ? cpumask_next_and+0x19/0x20
> > [  381.346110]  ? slub_cpu_dead+0xd0/0xd0
> > [  381.346113]  flush_cpu_slab+0x36/0x50
> > [  381.346117]  ? slub_cpu_dead+0xd0/0xd0
> > [  381.346125]  on_each_cpu_mask+0x51/0x70
> > [  381.346131]  ? ksm_migrate_page+0x60/0x60
> > [  381.346134]  on_each_cpu_cond_mask+0xab/0x100
> > [  381.346143]  __kmem_cache_shrink+0x56/0x320
> > [  381.346150]  ? ret_from_fork+0x3a/0x50
> > [  381.346157]  ? unwind_next_frame+0x73/0x480
> > [  381.346176]  ? __lock_acquire+0x252/0x1410
> > [  381.346188]  ? kmemcg_workfn+0x21/0x50
> > [  381.346196]  ? __mutex_lock+0x99/0x920
> > [  381.346199]  ? kmemcg_workfn+0x21/0x50
> > [  381.346205]  ? kmemcg_workfn+0x21/0x50
> > [  381.346216]  __kmemcg_cache_deactivate_after_rcu+0xe/0x40
> > [  381.346220]  kmemcg_cache_deactivate_after_rcu+0xe/0x20
> > [  381.346223]  kmemcg_workfn+0x31/0x50
> > [  381.346230]  process_one_work+0x23c/0x5e0
> > [  381.346241]  worker_thread+0x3c/0x390
> > [  381.346248]  ? process_one_work+0x5e0/0x5e0
> > [  381.346252]  kthread+0x11d/0x140
> > [  381.346255]  ? kthread_create_on_node+0x60/0x60
> > [  381.346261]  ret_from_fork+0x3a/0x50
> > [  381.346275] irq event stamp: 10302
> > [  381.346278] hardirqs last  enabled at (10301): []
> > _raw_spin_unlock_irq+0x29/0x40
> > [  381.346282] hardirqs last disabled at (10302): []
> > on_each_cpu_mask+0x49/0x70
> > [  381.346287] softirqs last  enabled at (10262): []
> > cgroup_idr_replace+0x3a/0x50
> > [  381.346290] softirqs last disabled at (10260): []
> > cgroup_idr_replace+0x1d/0x50
> > [  381.346293] ---[ end trace b324ba73eb3659f0 ]---
> >
> > Reported-by: Andrei Vagin 
> > Signed-off-by: Roman Gushchin 
> > Cc: Christoph Lameter 
> > Cc: Johannes Weiner 
> > Cc: Michal Hocko 
> > Cc: Shakeel Butt 
> > Cc: Vladimir Davydov 
> > Cc: Waiman Long 
> > Cc: David Rientjes 
> > Cc: Joonsoo Kim 
> > Cc: Pekka Enberg 
> > ---
> >  mm/slab.h | 17 +
> >  1 file changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/slab.h b/mm/slab.h
> > index a4c9b9d042de..c02e7f44268b 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -294,8 +294,12 @@ static __always_inline int 

Re: [PATCH] mm: memcg/slab: properly handle kmem_caches reparented to root_mem_cgroup

2019-06-20 Thread Shakeel Butt
On Wed, Jun 19, 2019 at 6:57 PM Roman Gushchin  wrote:
>
> As a result of reparenting a kmem_cache might belong to the root
> memory cgroup. It happens when a top-level memory cgroup is removed,
> and all associated kmem_caches are reparented to the root memory
> cgroup.
>
> The root memory cgroup is special, and requires a special handling.
> Let's make sure that we don't try to charge or uncharge it,
> and we handle system-wide vmstats exactly as for root kmem_caches.
>
> Note, that we still need to alter the kmem_cache reference counter,
> so that the kmem_cache can be released properly.
>
> The issue was discovered by running CRIU tests; the following warning
> did appear:
>
> [  381.345960] WARNING: CPU: 0 PID: 11655 at mm/page_counter.c:62
> page_counter_cancel+0x26/0x30
> [  381.345992] Modules linked in:
> [  381.345998] CPU: 0 PID: 11655 Comm: kworker/0:8 Not tainted
> 5.2.0-rc5-next-20190618+ #1
> [  381.346001] Hardware name: Google Google Compute Engine/Google
> Compute Engine, BIOS Google 01/01/2011
> [  381.346010] Workqueue: memcg_kmem_cache kmemcg_workfn
> [  381.346013] RIP: 0010:page_counter_cancel+0x26/0x30
> [  381.346017] Code: 1f 44 00 00 0f 1f 44 00 00 48 89 f0 53 48 f7 d8
> f0 48 0f c1 07 48 29 f0 48 89 c3 48 89 c6 e8 61 ff ff ff 48 85 db 78
> 02 5b c3 <0f> 0b 5b c3 66 0f 1f 44 00 00 0f 1f 44 00 00 48 85 ff 74 41
> 41 55
> [  381.346019] RSP: 0018:b3b34319f990 EFLAGS: 00010086
> [  381.346022] RAX: fffc RBX: fffc RCX: 
> 0004
> [  381.346024] RDX:  RSI: fffc RDI: 
> 9c2cd7165270
> [  381.346026] RBP: 0004 R08:  R09: 
> 0001
> [  381.346028] R10: 00c8 R11: 9c2cd684e660 R12: 
> fffc
> [  381.346030] R13: 0002 R14: 0006 R15: 
> 9c2c8ce1f200
> [  381.346033] FS:  () GS:9c2cd820()
> knlGS:
> [  381.346039] CS:  0010 DS:  ES:  CR0: 80050033
> [  381.346041] CR2: 007be000 CR3: 0001cdbfc005 CR4: 
> 001606f0
> [  381.346043] DR0:  DR1:  DR2: 
> 
> [  381.346045] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> [  381.346047] Call Trace:
> [  381.346054]  page_counter_uncharge+0x1d/0x30
> [  381.346065]  __memcg_kmem_uncharge_memcg+0x39/0x60
> [  381.346071]  __free_slab+0x34c/0x460
> [  381.346079]  deactivate_slab.isra.80+0x57d/0x6d0
> [  381.346088]  ? add_lock_to_list.isra.36+0x9c/0xf0
> [  381.346095]  ? __lock_acquire+0x252/0x1410
> [  381.346106]  ? cpumask_next_and+0x19/0x20
> [  381.346110]  ? slub_cpu_dead+0xd0/0xd0
> [  381.346113]  flush_cpu_slab+0x36/0x50
> [  381.346117]  ? slub_cpu_dead+0xd0/0xd0
> [  381.346125]  on_each_cpu_mask+0x51/0x70
> [  381.346131]  ? ksm_migrate_page+0x60/0x60
> [  381.346134]  on_each_cpu_cond_mask+0xab/0x100
> [  381.346143]  __kmem_cache_shrink+0x56/0x320
> [  381.346150]  ? ret_from_fork+0x3a/0x50
> [  381.346157]  ? unwind_next_frame+0x73/0x480
> [  381.346176]  ? __lock_acquire+0x252/0x1410
> [  381.346188]  ? kmemcg_workfn+0x21/0x50
> [  381.346196]  ? __mutex_lock+0x99/0x920
> [  381.346199]  ? kmemcg_workfn+0x21/0x50
> [  381.346205]  ? kmemcg_workfn+0x21/0x50
> [  381.346216]  __kmemcg_cache_deactivate_after_rcu+0xe/0x40
> [  381.346220]  kmemcg_cache_deactivate_after_rcu+0xe/0x20
> [  381.346223]  kmemcg_workfn+0x31/0x50
> [  381.346230]  process_one_work+0x23c/0x5e0
> [  381.346241]  worker_thread+0x3c/0x390
> [  381.346248]  ? process_one_work+0x5e0/0x5e0
> [  381.346252]  kthread+0x11d/0x140
> [  381.346255]  ? kthread_create_on_node+0x60/0x60
> [  381.346261]  ret_from_fork+0x3a/0x50
> [  381.346275] irq event stamp: 10302
> [  381.346278] hardirqs last  enabled at (10301): []
> _raw_spin_unlock_irq+0x29/0x40
> [  381.346282] hardirqs last disabled at (10302): []
> on_each_cpu_mask+0x49/0x70
> [  381.346287] softirqs last  enabled at (10262): []
> cgroup_idr_replace+0x3a/0x50
> [  381.346290] softirqs last disabled at (10260): []
> cgroup_idr_replace+0x1d/0x50
> [  381.346293] ---[ end trace b324ba73eb3659f0 ]---
>
> Reported-by: Andrei Vagin 
> Signed-off-by: Roman Gushchin 
> Cc: Christoph Lameter 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Shakeel Butt 
> Cc: Vladimir Davydov 
> Cc: Waiman Long 
> Cc: David Rientjes 
> Cc: Joonsoo Kim 
> Cc: Pekka Enberg 
> ---
>  mm/slab.h | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index a4c9b9d042de..c02e7f44268b 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -294,8 +294,12 @@ static __always_inline int memcg_charge_slab(struct page 
> *page,
> memcg = parent_mem_cgroup(memcg);
> rcu_read_unlock();
>
> -   if (unlikely(!memcg))
> +   if (unlikely(!memcg || mem_cgroup_is_root(memcg))) {
> +   mod_node_page_state(page_pgdat(page), cache_vmstat_idx(s),
> +