On Tue, Dec 15, 2015 at 02:38:58PM -0500, Johannes Weiner wrote:
> On Mon, Dec 14, 2015 at 08:14:55PM +0300, Vladimir Davydov wrote:
> > On Fri, Dec 11, 2015 at 02:54:13PM -0500, Johannes Weiner wrote:
> > ...
> > > -static int
> > > -mem_cgroup_css_online(struct cgroup_subsys_state *css)
> > > +static struct cgroup_subsys_state * __ref
> > > +mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> > >  {
> > > - struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> > > - struct mem_cgroup *parent = mem_cgroup_from_css(css->parent);
> > > - int ret;
> > > -
> > > - if (css->id > MEM_CGROUP_ID_MAX)
> > > -         return -ENOSPC;
> > > + struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
> > > + struct mem_cgroup *memcg;
> > > + long error = -ENOMEM;
> > >  
> > > - if (!parent)
> > > -         return 0;
> > > + memcg = mem_cgroup_alloc();
> > > + if (!memcg)
> > > +         return ERR_PTR(error);
> > >  
> > >   mutex_lock(&memcg_create_mutex);
> > 
> > It is pointless to take memcg_create_mutex in ->css_alloc. It won't
> > prevent setting use_hierarchy for parent after a new child was
> > allocated, but before it was added to the list of children (see
> > create_css()). Taking the mutex in ->css_online renders this race
> > impossible. That is, your cleanup breaks use_hierarchy consistency
> > check.
> > 
> > Can we drop this use_hierarchy consistency check at all and allow
> > children of a cgroup with use_hierarchy=1 have use_hierarchy=0? Yeah,
> > that might result in some strangeness if cgroups are created in parallel
> > with use_hierarchy flipped, but is it a valid use case? I surmise, one
> > just sets use_hierarchy for a cgroup once and for good before starting
> > to create sub-cgroups.
> 
> I don't think we have to support airtight exclusion between somebody
> changing the parent attribute and creating new children that inherit
> these attributes. Everything will still work if this race happens.
> 
> Does that mean we have to remove the restriction altogether? I'm not
> convinced. We can just keep it for historical purposes so that we do
> not *encourage* this weird setting.

Well, legacy hierarchy is scheduled to die, so it's too late to
encourage or discourage any setting regarding it.

Besides, hierarchy mode must be enabled for 99% setups, because this is
what systemd does at startup. So I don't think we would hurt anybody by
dropping this check altogether - IMO it'd be fairer than having a check
that might sometimes fail.

It's not something I really care about though, so I don't insist.

> 
> I think that's good enough. Let's just remove the memcg_create_mutex.

I'm fine with it, but I think this deserves a comment in the commit
message.

...
> So, how about the following fixlets on top to address your comments?
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index af8714a..124a802 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -250,13 +250,6 @@ enum res_type {
>  /* Used for OOM nofiier */
>  #define OOM_CONTROL          (0)
>  
> -/*
> - * The memcg_create_mutex will be held whenever a new cgroup is created.
> - * As a consequence, any change that needs to protect against new child 
> cgroups
> - * appearing has to hold it as well.
> - */
> -static DEFINE_MUTEX(memcg_create_mutex);
> -
>  /* Some nice accessors for the vmpressure. */
>  struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg)
>  {
> @@ -2660,14 +2653,6 @@ static inline bool memcg_has_children(struct 
> mem_cgroup *memcg)
>  {
>       bool ret;
>  
> -     /*
> -      * The lock does not prevent addition or deletion of children, but
> -      * it prevents a new child from being initialized based on this
> -      * parent in css_online(), so it's enough to decide whether
> -      * hierarchically inherited attributes can still be changed or not.
> -      */
> -     lockdep_assert_held(&memcg_create_mutex);
> -
>       rcu_read_lock();
>       ret = css_next_child(NULL, &memcg->css);
>       rcu_read_unlock();
> @@ -2730,10 +2715,8 @@ static int mem_cgroup_hierarchy_write(struct 
> cgroup_subsys_state *css,
>       struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>       struct mem_cgroup *parent_memcg = 
> mem_cgroup_from_css(memcg->css.parent);
>  
> -     mutex_lock(&memcg_create_mutex);
> -
>       if (memcg->use_hierarchy == val)
> -             goto out;
> +             return 0;
>  
>       /*
>        * If parent's use_hierarchy is set, we can't make any modifications
> @@ -2752,9 +2735,6 @@ static int mem_cgroup_hierarchy_write(struct 
> cgroup_subsys_state *css,
>       } else
>               retval = -EINVAL;
>  
> -out:
> -     mutex_unlock(&memcg_create_mutex);
> -
>       return retval;
>  }
>  
> @@ -2929,6 +2909,10 @@ static void memcg_offline_kmem(struct mem_cgroup 
> *memcg)
>  
>  static void memcg_free_kmem(struct mem_cgroup *memcg)
>  {
> +     /* css_alloc() failed, offlining didn't happen */
> +     if (unlikely(memcg->kmem_state == KMEM_ONLINE))

It's not a hot-path, so there's no need in using 'unlikely' here apart
from improving readability, but the comment should be enough.

> +             memcg_offline_kmem(memcg);
> +

Calling 'offline' from css_free looks a little bit awkward, but let it
be.

Anyway, it's a really nice cleanup, thanks!

Acked-by: Vladimir Davydov <[email protected]>

>       if (memcg->kmem_state == KMEM_ALLOCATED) {
>               memcg_destroy_kmem_caches(memcg);
>               static_branch_dec(&memcg_kmem_enabled_key);
> @@ -2956,11 +2940,9 @@ static int memcg_update_kmem_limit(struct mem_cgroup 
> *memcg,
>       mutex_lock(&memcg_limit_mutex);
>       /* Top-level cgroup doesn't propagate from root */
>       if (!memcg_kmem_online(memcg)) {
> -             mutex_lock(&memcg_create_mutex);
>               if (cgroup_is_populated(memcg->css.cgroup) ||
>                   (memcg->use_hierarchy && memcg_has_children(memcg)))
>                       ret = -EBUSY;
> -             mutex_unlock(&memcg_create_mutex);
>               if (ret)
>                       goto out;
>               ret = memcg_online_kmem(memcg);
> @@ -4184,14 +4166,14 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
> *parent_css)
>       if (!memcg)
>               return ERR_PTR(error);
>  
> -     mutex_lock(&memcg_create_mutex);
>       memcg->high = PAGE_COUNTER_MAX;
>       memcg->soft_limit = PAGE_COUNTER_MAX;
> -     if (parent)
> +     if (parent) {
>               memcg->swappiness = mem_cgroup_swappiness(parent);
> +             memcg->oom_kill_disable = parent->oom_kill_disable;
> +     }
>       if (parent && parent->use_hierarchy) {
>               memcg->use_hierarchy = true;
> -             memcg->oom_kill_disable = parent->oom_kill_disable;
>               page_counter_init(&memcg->memory, &parent->memory);
>               page_counter_init(&memcg->memsw, &parent->memsw);
>               page_counter_init(&memcg->kmem, &parent->kmem);
> @@ -4209,7 +4191,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
> *parent_css)
>               if (parent != root_mem_cgroup)
>                       memory_cgrp_subsys.broken_hierarchy = true;
>       }
> -     mutex_unlock(&memcg_create_mutex);
>  
>       /* The following stuff does not apply to the root */
>       if (!parent) {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to