[CCing Greg who I forgot to add the to list - sorry about that. The
thread starts here: http://marc.info/?l=linux-mm&m=143264102317318&w=2]

On Tue 26-05-15 10:10:11, Johannes Weiner wrote:
> On Tue, May 26, 2015 at 01:50:06PM +0200, Michal Hocko wrote:
> > Please note that this patch introduces a USER VISIBLE CHANGE OF BEHAVIOR.
> > Without mm->owner _all_ tasks associated with the mm_struct would
> > initiate memcg migration while previously only owner of the mm_struct
> > could do that. The original behavior was awkward though because the user
> > task didn't have any means to find out the current owner (esp. after
> > mm_update_next_owner) so the migration behavior was not well defined
> > in general.
> > New cgroup API (unified hierarchy) will discontinue tasks file which
> > means that migrating threads will no longer be possible. In such a case
> > having CLONE_VM without CLONE_THREAD could emulate the thread behavior
> > but this patch prevents from isolating memcg controllers from others.
> > Nevertheless I am not convinced such a use case would really deserve
> > complications on the memcg code side.
> 
> I think such a change is okay.  The memcg semantics of moving threads
> with the same mm into separate groups have always been arbitrary.  No
> reasonable behavior can be expected of this, so what sane real life
> usecase would rely on it?

I can imagine that threads would go to different cgroups because of
other controllers (e.g. cpu or cpuset).
AFAIR google was doing threads distribution.

> > @@ -104,7 +105,12 @@ static inline bool mm_match_cgroup(struct mm_struct 
> > *mm,
> >     bool match = false;
> >  
> >     rcu_read_lock();
> > -   task_memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
> > +   /*
> > +    * rcu_dereference would be better but mem_cgroup is not a complete
> > +    * type here
> > +    */
> > +   task_memcg = READ_ONCE(mm->memcg);
> > +   smp_read_barrier_depends();
> >     if (task_memcg)
> >             match = mem_cgroup_is_descendant(task_memcg, memcg);
> >     rcu_read_unlock();
> 
> This function has only one user in rmap.  If you inline it there, you
> can use rcu_dereference() and get rid of the specialness & comment.

I am not sure I understand. struct mem_cgroup is defined in
mm/memcontrol.c so mm/rmap.c will not see it. Or do you suggest pulling
struct mem_cgroup out into a header with all the dependencies?

> > @@ -195,6 +201,10 @@ void mem_cgroup_split_huge_fixup(struct page *head);
> >  #else /* CONFIG_MEMCG */
> >  struct mem_cgroup;
> >  
> > +void mm_drop_memcg(struct mm_struct *mm)
> > +{}
> > +void mm_set_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
> > +{}
> 
> static inline?

Of course. Fixed.
 
[...]
> > @@ -469,6 +469,46 @@ static inline struct mem_cgroup 
> > *mem_cgroup_from_id(unsigned short id)
> >     return mem_cgroup_from_css(css);
> >  }
> >  
> > +static struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
> > +{
> > +   if (!p->mm)
> > +           return NULL;
> > +   return rcu_dereference(p->mm->memcg);
> > +}
> > +
> > +void mm_set_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
> > +{
> > +   if (memcg)
> > +           css_get(&memcg->css);
> > +   rcu_assign_pointer(mm->memcg, memcg);
> > +}
> > +
> > +void mm_drop_memcg(struct mm_struct *mm)
> > +{
> > +   /*
> > +    * This is the last reference to mm so nobody can see
> > +    * this memcg
> > +    */
> > +   if (mm->memcg)
> > +           css_put(&mm->memcg->css);
> > +}
> 
> This is really simple and obvious and has only one caller, it would be
> better to inline this into mmput().  The comment would also be easier
> to understand in conjunction with the mmdrop() in the callsite:

Same case as rmap.c.

> 
>       if (mm->memcg)
>               css_put(&mm->memcg->css);
>       /* We could reset mm->memcg, but this will free the mm: */
>       mmdrop(mm);

I like your comment more. I will update it

> 
> The same goes for mm_set_memcg, there is no real need for obscuring a
> simple get-and-store.
> 
> > +static void mm_move_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
> > +{
> > +   struct mem_cgroup *old_memcg;
> > +
> > +   mm_set_memcg(mm, memcg);
> > +
> > +   /*
> > +    * wait for all current users of the old memcg before we
> > +    * release the reference.
> > +    */
> > +   old_memcg = mm->memcg;

Doh. Last minute changes... This is incorrect, of course, because I am
dropping the new memcg reference. Fixed

> > +   synchronize_rcu();
> > +   if (old_memcg)
> > +           css_put(&old_memcg->css);
> > +}
> 
> I'm not sure why we need that synchronize_rcu() in here, the css is
> itself protected by RCU and a failing tryget will prevent you from
> taking it outside a RCU-locked region.

Yeah, you are right. Removed.

> Aside from that, there is again exactly one place that performs this
> operation.  Please inline it into mem_cgroup_move_task().

OK, I will inline it there.

> > @@ -5204,6 +5251,12 @@ static void mem_cgroup_move_task(struct 
> > cgroup_subsys_state *css,
> >     struct mm_struct *mm = get_task_mm(p);
> >  
> >     if (mm) {
> > +           /*
> > +            * Commit to a new memcg. mc.to points to the destination
> > +            * memcg even when the current charges are not moved.
> > +            */
> > +           mm_move_memcg(mm, mc.to);
> > +
> >             if (mc_move_charge())
> >                     mem_cgroup_move_charge(mm);
> >             mmput(mm);
> 
> It's a little weird to use mc.to when not moving charges, as "mc"
> stands for "move charge".  Why not derive the destination from @css,
> just like can_attach does?  It's a mere cast.  That also makes patch
> #2 in your series unnecessary.

Good idea!

> Otherwise, the patch looks great to me.

Thanks for the review. Changes based on your feedback:
---
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 315ec1e58acb..50cf88c0249d 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -201,10 +201,12 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 #else /* CONFIG_MEMCG */
 struct mem_cgroup;
 
-void mm_drop_memcg(struct mm_struct *mm)
-{}
-void mm_set_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
-{}
+static inline void mm_drop_memcg(struct mm_struct *mm)
+{
+}
+static inline void mm_set_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
+{
+}
 static inline void mem_cgroup_events(struct mem_cgroup *memcg,
                                     enum mem_cgroup_events_index idx,
                                     unsigned int nr)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 950875eb7d89..2c5c336aca6e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -486,29 +486,13 @@ void mm_set_memcg(struct mm_struct *mm, struct mem_cgroup 
*memcg)
 void mm_drop_memcg(struct mm_struct *mm)
 {
        /*
-        * This is the last reference to mm so nobody can see
-        * this memcg
+        * We could reset mm->memcg, but the mm goes away as this is the
+        * last reference.
         */
        if (mm->memcg)
                css_put(&mm->memcg->css);
 }
 
-static void mm_move_memcg(struct mm_struct *mm, struct mem_cgroup *memcg)
-{
-       struct mem_cgroup *old_memcg;
-
-       mm_set_memcg(mm, memcg);
-
-       /*
-        * wait for all current users of the old memcg before we
-        * release the reference.
-        */
-       old_memcg = mm->memcg;
-       synchronize_rcu();
-       if (old_memcg)
-               css_put(&old_memcg->css);
-}
-
 /* Writing them here to avoid exposing memcg's inner layout */
 #if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
 
@@ -5252,10 +5236,15 @@ static void mem_cgroup_move_task(struct 
cgroup_subsys_state *css,
 
        if (mm) {
                /*
-                * Commit to a new memcg. mc.to points to the destination
-                * memcg even when the current charges are not moved.
+                * Commit to the target memcg even when we do not move
+                * charges.
                 */
-               mm_move_memcg(mm, mc.to);
+               struct mem_cgroup *old_memcg = READ_ONCE(mm->memcg);
+               struct mem_cgroup *new_memcg = mem_cgroup_from_css(css);
+
+               mm_set_memcg(mm, new_memcg);
+               if (old_memcg)
+                       css_put(&old_memcg->css);
 
                if (mc_move_charge())
                        mem_cgroup_move_charge(mm);
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to