Problem 1:
After [1] memory cgroup css can be leaked in cgroup_create(). Suppose
we have succesfult css_alloc() and then percpu_ref_init() fails before
we link css to cgroup in init_cgroup_css(), this way css_free() is not
called in the err_free_all label error path and css is leaked.
To fix that we just need to link css to cgroup first before refcount
handling, this way in err_free_all label we would get proper cleanup.
Note that if percpu_ref_init() fails then percpu_ref_exit() will be a
noop, as it checks css->refcnt->percpu_count_ptr is not NULL.
Problem 2:
Another problem is that even if css is linked via init_cgroup_css()
succesfully after succesful css_alloc(), but then cgroup_create_file()
fails (which we see a lot in multiple crashes of this problem), we get
to err_free_all label error path which calls css_free(). And css_free()
does not cleanup memory cgroup id allocated by css_alloc() in
mem_cgroup_idr.
So we end up with dangling id in mem_cgroup_idr which points to already
freed mem_cgroup. To fix that we can add conditional idr id put into
mem_cgroup_css_free(). It will only trigger on this cleanup error path,
for fully succesfully allocated cgroup the id will be released when last
reference on mem_cgroup is put in mem_cgroup_id_put_many(). Note that we
need to add synchronize_rcu() after removing the id and before freeing
memcg as id to memcg lookup is protected by rcu.
Fixes: 3f28bea872985 ("ms/cgroup: use percpu refcnt for cgroup_subsys_states")
[1]
https://virtuozzo.atlassian.net/browse/PSBM-155867
Signed-off-by: Pavel Tikhomirov <[email protected]>
---
v2: Kostya noticed that mem_cgroup_css_free() requires css to be linked
to cgroup to work correctly, fix that by instead just reordering
percpu_ref_init() and init_cgroup_css().
---
kernel/cgroup.c | 4 ++--
mm/memcontrol.c | 4 ++++
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 348563b2d41f3..59107ea8ae76a 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4950,11 +4950,11 @@ static long cgroup_create(struct cgroup *parent, struct
dentry *dentry,
goto err_free_all;
}
+ init_cgroup_css(css, ss, cgrp);
+
err = percpu_ref_init(&css->refcnt, css_release, 0, GFP_KERNEL);
if (err)
goto err_free_all;
-
- init_cgroup_css(css, ss, cgrp);
}
/*
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index f173322fb44eb..58d32254d35ed 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7112,6 +7112,10 @@ static void mem_cgroup_css_free(struct cgroup *cont)
vmpressure_cleanup(&memcg->vmpressure);
memcg_destroy_kmem(memcg);
memcg_free_shrinker_maps(memcg);
+ if (memcg->id.id > 0) {
+ mem_cgroup_id_remove(memcg);
+ synchronize_rcu();
+ }
__mem_cgroup_free(memcg);
}
--
2.51.1
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel