Re: [PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling

2012-10-19 Thread Tejun Heo
Hello, Michal.

On Fri, Oct 19, 2012 at 03:24:38PM +0200, Michal Hocko wrote:
> > Maybe convert to proper /** function comment while at it?  
> 
> these are internal functions and we usually do not create kerneldoc for
> them. But I can surely change it - it would deserve a bigger clean up
> then.

Yeah, I got into the habit of making function comments kerneldoc if
the function is important / scary enough.  It's upto you but I think
that would be an improvement here.

> What about:
> "
>  * Although this might fail (get_page_unless_zero, isolate_lru_page or
>  * mem_cgroup_move_account fails) the failure is always temporary and
>  * it signals a race with a page removal/uncharge or migration. In the
>  * first case the page is on the way out and it will vanish from the LRU
>  * on the next attempt and the call should be retried later.
>  * Isolation from the LRU fails only if page has been isolated from
>  * the LRU since we looked at it and that usually means either global
>  * reclaim or migration going on. The page will either get back to the
>  * LRU or vanish.
>  * Finaly mem_cgroup_move_account fails only if the page got uncharged
>  * (!PageCgroupUsed) or moved to a different group. The page will
>  * disappear in the next attempt.
> "
> 
> Better? Or should it rather be in the changelog?

Looks good to me and I personally think it deserves to be a comment.

> > Is there anything which can keep failing until migration to another
> > cgroup is complete?  
> 
> This is not about migration to another cgroup. Remember there are no
> tasks in the group so we have no origin for the migration. I was talking
> about migrate_pages.
> 
> > I think there is, e.g., if mmap_sem is busy or memcg is co-mounted
> > with other controllers and another controller's ->attach() is blocking
> > on something.
> 
> I am not sure I understand your concern. There are no tasks and we will
> break out the loop if some appear. And yes we can retry a lot in
> pathological cases. But this is a group removal path which is not hot.

Ah, okay, I misunderstood that it could wait for task cgroup
migration.

> > If so, busy-looping blindly probably isn't a good idea and we would
> > want at least msleep between retries (e.g. have two lists, throw
> > failed ones to the other and sleep shortly when switching the front
> > and back lists).
> 
> we do cond_resched if we fail.

If it won't ever spin for someone else sleeping, I think it should be
fine.

> > Maybe we want to trigger some warning if retry count gets too high?
> > At least for now?
> 
> We can but is this really worth it?

I don't know.  My sense of danger here is likely to be way off
compared to yours so if you think it's a fairly safe loop, it probably
is.

It just reminds me of the busy looping we had in freezer.  It was
correct but actually manifested as a problem - when a system was going
down for emergency hibernation from low battery, that busy loop not
too rarely drained the small reserve making the machine lose power
before completing hibernation.  So, it could be that I'm a bit
paranoid here.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling

2012-10-19 Thread Michal Hocko
On Thu 18-10-12 15:16:54, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Oct 17, 2012 at 03:30:45PM +0200, Michal Hocko wrote:
> > mem_cgroup_force_empty_list currently tries to remove all pages from
> > the given LRU. To prevent from temoporary failures (EBUSY returned by
> > mem_cgroup_move_parent) it uses a margin to the current LRU pages and
> > returns the true if there are still some pages left on the list.
> > 
> > If we consider that mem_cgroup_move_parent fails only when we are racing
> > with somebody else removing the page (resp. uncharging it) or when the
> > page is migrated then it is obvious that all those failures are only
> > temporal and so we can safely retry later.
> > Let's get rid of the safety margin and make the loop really wait for the
> > empty LRU. The caller should still make sure that all charges have been
> > removed from the res_counter because mem_cgroup_replace_page_cache might
> > add a page to the LRU after the check (it doesn't touch res_counter
> > though).
> > This catches most of the cases except for shmem which might call
> > mem_cgroup_replace_page_cache with a page which is not charged and on
> > the LRU yet but this was the case also without this patch. In order to
> > fix this we need a guarantee that try_get_mem_cgroup_from_page falls
> > back to the current mm's cgroup so it needs css_tryget to fail. This
> > will be fixed up in a later patch because it nees a help from cgroup
> > core.
> > 
> > Signed-off-by: Michal Hocko 
> 
> In the sense that "I looked at it and nothing seemed too scary".
> 
>  Reviewed-by: Tejun Heo 

Thanks

> 
> Some nitpicks below.
> 
> >  /*
> > - * move charges to its parent.
> > + * move charges to its parent or the root cgroup if the group
> > + * has no parent (aka use_hierarchy==0).
> > + * Although this might fail the failure is always temporary and it
> > + * signals a race with a page removal/uncharge or migration. In the
> > + * first case the page will vanish from the LRU on the next attempt
> > + * and the call should be retried later.
> >   */
> > -
> 
> Maybe convert to proper /** function comment while at it?  

these are internal functions and we usually do not create kerneldoc for
them. But I can surely change it - it would deserve a bigger clean up
then.

> I also think it would be helpful to actually comment on each possible
> failure case explaining why the failure condition is temporal.

What about:
"
 * Although this might fail (get_page_unless_zero, isolate_lru_page or
 * mem_cgroup_move_account fails) the failure is always temporary and
 * it signals a race with a page removal/uncharge or migration. In the
 * first case the page is on the way out and it will vanish from the LRU
 * on the next attempt and the call should be retried later.
 * Isolation from the LRU fails only if page has been isolated from
 * the LRU since we looked at it and that usually means either global
 * reclaim or migration going on. The page will either get back to the
 * LRU or vanish.
 * Finaly mem_cgroup_move_account fails only if the page got uncharged
 * (!PageCgroupUsed) or moved to a different group. The page will
 * disappear in the next attempt.
"

Better? Or should it rather be in the changelog?

> 
> >  /*
> >   * Traverse a specified page_cgroup list and try to drop them all.  This 
> > doesn't
> > - * reclaim the pages page themselves - it just removes the page_cgroups.
> > - * Returns true if some page_cgroups were not freed, indicating that the 
> > caller
> > - * must retry this operation.
> > + * reclaim the pages page themselves - pages are moved to the parent (or 
> > root)
> > + * group.
> >   */
> 
> Ditto.
> 
> > -static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> > +static void mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> > int node, int zid, enum lru_list lru)
> >  {
> > struct mem_cgroup_per_zone *mz;
> > -   unsigned long flags, loop;
> > +   unsigned long flags;
> > struct list_head *list;
> > struct page *busy;
> > struct zone *zone;
> > @@ -3696,11 +3701,8 @@ static bool mem_cgroup_force_empty_list(struct 
> > mem_cgroup *memcg,
> > mz = mem_cgroup_zoneinfo(memcg, node, zid);
> > list = >lruvec.lists[lru];
> >  
> > -   loop = mz->lru_size[lru];
> > -   /* give some margin against EBUSY etc...*/
> > -   loop += 256;
> > busy = NULL;
> > -   while (loop--) {
> > +   do {
> > struct page_cgroup *pc;
> > struct page *page;
> >  
> > @@ -3726,8 +3728,7 @@ static bool mem_cgroup_force_empty_list(struct 
> > mem_cgroup *memcg,
> > cond_resched();
> > } else
> > busy = NULL;
> > -   }
> > -   return !list_empty(list);
> > +   } while (!list_empty(list));
> >  }
> 
> Is there anything which can keep failing until migration to another
> cgroup is complete?  

This is not about migration to another cgroup. Remember there are no
tasks in the group so we have no 

Re: [PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling

2012-10-19 Thread Michal Hocko
On Thu 18-10-12 15:16:54, Tejun Heo wrote:
 Hello, Michal.
 
 On Wed, Oct 17, 2012 at 03:30:45PM +0200, Michal Hocko wrote:
  mem_cgroup_force_empty_list currently tries to remove all pages from
  the given LRU. To prevent from temoporary failures (EBUSY returned by
  mem_cgroup_move_parent) it uses a margin to the current LRU pages and
  returns the true if there are still some pages left on the list.
  
  If we consider that mem_cgroup_move_parent fails only when we are racing
  with somebody else removing the page (resp. uncharging it) or when the
  page is migrated then it is obvious that all those failures are only
  temporal and so we can safely retry later.
  Let's get rid of the safety margin and make the loop really wait for the
  empty LRU. The caller should still make sure that all charges have been
  removed from the res_counter because mem_cgroup_replace_page_cache might
  add a page to the LRU after the check (it doesn't touch res_counter
  though).
  This catches most of the cases except for shmem which might call
  mem_cgroup_replace_page_cache with a page which is not charged and on
  the LRU yet but this was the case also without this patch. In order to
  fix this we need a guarantee that try_get_mem_cgroup_from_page falls
  back to the current mm's cgroup so it needs css_tryget to fail. This
  will be fixed up in a later patch because it nees a help from cgroup
  core.
  
  Signed-off-by: Michal Hocko mho...@suse.cz
 
 In the sense that I looked at it and nothing seemed too scary.
 
  Reviewed-by: Tejun Heo t...@kernel.org

Thanks

 
 Some nitpicks below.
 
   /*
  - * move charges to its parent.
  + * move charges to its parent or the root cgroup if the group
  + * has no parent (aka use_hierarchy==0).
  + * Although this might fail the failure is always temporary and it
  + * signals a race with a page removal/uncharge or migration. In the
  + * first case the page will vanish from the LRU on the next attempt
  + * and the call should be retried later.
*/
  -
 
 Maybe convert to proper /** function comment while at it?  

these are internal functions and we usually do not create kerneldoc for
them. But I can surely change it - it would deserve a bigger clean up
then.

 I also think it would be helpful to actually comment on each possible
 failure case explaining why the failure condition is temporal.

What about:

 * Although this might fail (get_page_unless_zero, isolate_lru_page or
 * mem_cgroup_move_account fails) the failure is always temporary and
 * it signals a race with a page removal/uncharge or migration. In the
 * first case the page is on the way out and it will vanish from the LRU
 * on the next attempt and the call should be retried later.
 * Isolation from the LRU fails only if page has been isolated from
 * the LRU since we looked at it and that usually means either global
 * reclaim or migration going on. The page will either get back to the
 * LRU or vanish.
 * Finaly mem_cgroup_move_account fails only if the page got uncharged
 * (!PageCgroupUsed) or moved to a different group. The page will
 * disappear in the next attempt.


Better? Or should it rather be in the changelog?

 
   /*
* Traverse a specified page_cgroup list and try to drop them all.  This 
  doesn't
  - * reclaim the pages page themselves - it just removes the page_cgroups.
  - * Returns true if some page_cgroups were not freed, indicating that the 
  caller
  - * must retry this operation.
  + * reclaim the pages page themselves - pages are moved to the parent (or 
  root)
  + * group.
*/
 
 Ditto.
 
  -static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
  +static void mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
  int node, int zid, enum lru_list lru)
   {
  struct mem_cgroup_per_zone *mz;
  -   unsigned long flags, loop;
  +   unsigned long flags;
  struct list_head *list;
  struct page *busy;
  struct zone *zone;
  @@ -3696,11 +3701,8 @@ static bool mem_cgroup_force_empty_list(struct 
  mem_cgroup *memcg,
  mz = mem_cgroup_zoneinfo(memcg, node, zid);
  list = mz-lruvec.lists[lru];
   
  -   loop = mz-lru_size[lru];
  -   /* give some margin against EBUSY etc...*/
  -   loop += 256;
  busy = NULL;
  -   while (loop--) {
  +   do {
  struct page_cgroup *pc;
  struct page *page;
   
  @@ -3726,8 +3728,7 @@ static bool mem_cgroup_force_empty_list(struct 
  mem_cgroup *memcg,
  cond_resched();
  } else
  busy = NULL;
  -   }
  -   return !list_empty(list);
  +   } while (!list_empty(list));
   }
 
 Is there anything which can keep failing until migration to another
 cgroup is complete?  

This is not about migration to another cgroup. Remember there are no
tasks in the group so we have no origin for the migration. I was talking
about migrate_pages.

 I think there is, e.g., if mmap_sem is busy or memcg is co-mounted
 with other 

Re: [PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling

2012-10-19 Thread Tejun Heo
Hello, Michal.

On Fri, Oct 19, 2012 at 03:24:38PM +0200, Michal Hocko wrote:
  Maybe convert to proper /** function comment while at it?  
 
 these are internal functions and we usually do not create kerneldoc for
 them. But I can surely change it - it would deserve a bigger clean up
 then.

Yeah, I got into the habit of making function comments kerneldoc if
the function is important / scary enough.  It's upto you but I think
that would be an improvement here.

 What about:
 
  * Although this might fail (get_page_unless_zero, isolate_lru_page or
  * mem_cgroup_move_account fails) the failure is always temporary and
  * it signals a race with a page removal/uncharge or migration. In the
  * first case the page is on the way out and it will vanish from the LRU
  * on the next attempt and the call should be retried later.
  * Isolation from the LRU fails only if page has been isolated from
  * the LRU since we looked at it and that usually means either global
  * reclaim or migration going on. The page will either get back to the
  * LRU or vanish.
  * Finaly mem_cgroup_move_account fails only if the page got uncharged
  * (!PageCgroupUsed) or moved to a different group. The page will
  * disappear in the next attempt.
 
 
 Better? Or should it rather be in the changelog?

Looks good to me and I personally think it deserves to be a comment.

  Is there anything which can keep failing until migration to another
  cgroup is complete?  
 
 This is not about migration to another cgroup. Remember there are no
 tasks in the group so we have no origin for the migration. I was talking
 about migrate_pages.
 
  I think there is, e.g., if mmap_sem is busy or memcg is co-mounted
  with other controllers and another controller's -attach() is blocking
  on something.
 
 I am not sure I understand your concern. There are no tasks and we will
 break out the loop if some appear. And yes we can retry a lot in
 pathological cases. But this is a group removal path which is not hot.

Ah, okay, I misunderstood that it could wait for task cgroup
migration.

  If so, busy-looping blindly probably isn't a good idea and we would
  want at least msleep between retries (e.g. have two lists, throw
  failed ones to the other and sleep shortly when switching the front
  and back lists).
 
 we do cond_resched if we fail.

If it won't ever spin for someone else sleeping, I think it should be
fine.

  Maybe we want to trigger some warning if retry count gets too high?
  At least for now?
 
 We can but is this really worth it?

I don't know.  My sense of danger here is likely to be way off
compared to yours so if you think it's a fairly safe loop, it probably
is.

It just reminds me of the busy looping we had in freezer.  It was
correct but actually manifested as a problem - when a system was going
down for emergency hibernation from low battery, that busy loop not
too rarely drained the small reserve making the machine lose power
before completing hibernation.  So, it could be that I'm a bit
paranoid here.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling

2012-10-18 Thread Tejun Heo
Hello, Michal.

On Wed, Oct 17, 2012 at 03:30:45PM +0200, Michal Hocko wrote:
> mem_cgroup_force_empty_list currently tries to remove all pages from
> the given LRU. To prevent from temoporary failures (EBUSY returned by
> mem_cgroup_move_parent) it uses a margin to the current LRU pages and
> returns the true if there are still some pages left on the list.
> 
> If we consider that mem_cgroup_move_parent fails only when we are racing
> with somebody else removing the page (resp. uncharging it) or when the
> page is migrated then it is obvious that all those failures are only
> temporal and so we can safely retry later.
> Let's get rid of the safety margin and make the loop really wait for the
> empty LRU. The caller should still make sure that all charges have been
> removed from the res_counter because mem_cgroup_replace_page_cache might
> add a page to the LRU after the check (it doesn't touch res_counter
> though).
> This catches most of the cases except for shmem which might call
> mem_cgroup_replace_page_cache with a page which is not charged and on
> the LRU yet but this was the case also without this patch. In order to
> fix this we need a guarantee that try_get_mem_cgroup_from_page falls
> back to the current mm's cgroup so it needs css_tryget to fail. This
> will be fixed up in a later patch because it nees a help from cgroup
> core.
> 
> Signed-off-by: Michal Hocko 

In the sense that "I looked at it and nothing seemed too scary".

 Reviewed-by: Tejun Heo 

Some nitpicks below.

>  /*
> - * move charges to its parent.
> + * move charges to its parent or the root cgroup if the group
> + * has no parent (aka use_hierarchy==0).
> + * Although this might fail the failure is always temporary and it
> + * signals a race with a page removal/uncharge or migration. In the
> + * first case the page will vanish from the LRU on the next attempt
> + * and the call should be retried later.
>   */
> -

Maybe convert to proper /** function comment while at it?  I also
think it would be helpful to actually comment on each possible failure
case explaining why the failure condition is temporal.

>  /*
>   * Traverse a specified page_cgroup list and try to drop them all.  This 
> doesn't
> - * reclaim the pages page themselves - it just removes the page_cgroups.
> - * Returns true if some page_cgroups were not freed, indicating that the 
> caller
> - * must retry this operation.
> + * reclaim the pages page themselves - pages are moved to the parent (or 
> root)
> + * group.
>   */

Ditto.

> -static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
> +static void mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
>   int node, int zid, enum lru_list lru)
>  {
>   struct mem_cgroup_per_zone *mz;
> - unsigned long flags, loop;
> + unsigned long flags;
>   struct list_head *list;
>   struct page *busy;
>   struct zone *zone;
> @@ -3696,11 +3701,8 @@ static bool mem_cgroup_force_empty_list(struct 
> mem_cgroup *memcg,
>   mz = mem_cgroup_zoneinfo(memcg, node, zid);
>   list = >lruvec.lists[lru];
>  
> - loop = mz->lru_size[lru];
> - /* give some margin against EBUSY etc...*/
> - loop += 256;
>   busy = NULL;
> - while (loop--) {
> + do {
>   struct page_cgroup *pc;
>   struct page *page;
>  
> @@ -3726,8 +3728,7 @@ static bool mem_cgroup_force_empty_list(struct 
> mem_cgroup *memcg,
>   cond_resched();
>   } else
>   busy = NULL;
> - }
> - return !list_empty(list);
> + } while (!list_empty(list));
>  }

Is there anything which can keep failing until migration to another
cgroup is complete?  I think there is, e.g., if mmap_sem is busy or
memcg is co-mounted with other controllers and another controller's
->attach() is blocking on something.

If so, busy-looping blindly probably isn't a good idea and we would
want at least msleep between retries (e.g. have two lists, throw
failed ones to the other and sleep shortly when switching the front
and back lists).

> + /*
> +  * This is a safety check because mem_cgroup_force_empty_list
> +  * could have raced with mem_cgroup_replace_page_cache callers
> +  * so the lru seemed empty but the page could have been added
> +  * right after the check. RES_USAGE should be safe as we always
> +  * charge before adding to the LRU.
> +  */
> + } while (res_counter_read_u64(>res, RES_USAGE) > 0);

Maybe we want to trigger some warning if retry count gets too high?
At least for now?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling

2012-10-18 Thread Tejun Heo
Hello, Michal.

On Wed, Oct 17, 2012 at 03:30:45PM +0200, Michal Hocko wrote:
 mem_cgroup_force_empty_list currently tries to remove all pages from
 the given LRU. To prevent from temoporary failures (EBUSY returned by
 mem_cgroup_move_parent) it uses a margin to the current LRU pages and
 returns the true if there are still some pages left on the list.
 
 If we consider that mem_cgroup_move_parent fails only when we are racing
 with somebody else removing the page (resp. uncharging it) or when the
 page is migrated then it is obvious that all those failures are only
 temporal and so we can safely retry later.
 Let's get rid of the safety margin and make the loop really wait for the
 empty LRU. The caller should still make sure that all charges have been
 removed from the res_counter because mem_cgroup_replace_page_cache might
 add a page to the LRU after the check (it doesn't touch res_counter
 though).
 This catches most of the cases except for shmem which might call
 mem_cgroup_replace_page_cache with a page which is not charged and on
 the LRU yet but this was the case also without this patch. In order to
 fix this we need a guarantee that try_get_mem_cgroup_from_page falls
 back to the current mm's cgroup so it needs css_tryget to fail. This
 will be fixed up in a later patch because it nees a help from cgroup
 core.
 
 Signed-off-by: Michal Hocko mho...@suse.cz

In the sense that I looked at it and nothing seemed too scary.

 Reviewed-by: Tejun Heo t...@kernel.org

Some nitpicks below.

  /*
 - * move charges to its parent.
 + * move charges to its parent or the root cgroup if the group
 + * has no parent (aka use_hierarchy==0).
 + * Although this might fail the failure is always temporary and it
 + * signals a race with a page removal/uncharge or migration. In the
 + * first case the page will vanish from the LRU on the next attempt
 + * and the call should be retried later.
   */
 -

Maybe convert to proper /** function comment while at it?  I also
think it would be helpful to actually comment on each possible failure
case explaining why the failure condition is temporal.

  /*
   * Traverse a specified page_cgroup list and try to drop them all.  This 
 doesn't
 - * reclaim the pages page themselves - it just removes the page_cgroups.
 - * Returns true if some page_cgroups were not freed, indicating that the 
 caller
 - * must retry this operation.
 + * reclaim the pages page themselves - pages are moved to the parent (or 
 root)
 + * group.
   */

Ditto.

 -static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
 +static void mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
   int node, int zid, enum lru_list lru)
  {
   struct mem_cgroup_per_zone *mz;
 - unsigned long flags, loop;
 + unsigned long flags;
   struct list_head *list;
   struct page *busy;
   struct zone *zone;
 @@ -3696,11 +3701,8 @@ static bool mem_cgroup_force_empty_list(struct 
 mem_cgroup *memcg,
   mz = mem_cgroup_zoneinfo(memcg, node, zid);
   list = mz-lruvec.lists[lru];
  
 - loop = mz-lru_size[lru];
 - /* give some margin against EBUSY etc...*/
 - loop += 256;
   busy = NULL;
 - while (loop--) {
 + do {
   struct page_cgroup *pc;
   struct page *page;
  
 @@ -3726,8 +3728,7 @@ static bool mem_cgroup_force_empty_list(struct 
 mem_cgroup *memcg,
   cond_resched();
   } else
   busy = NULL;
 - }
 - return !list_empty(list);
 + } while (!list_empty(list));
  }

Is there anything which can keep failing until migration to another
cgroup is complete?  I think there is, e.g., if mmap_sem is busy or
memcg is co-mounted with other controllers and another controller's
-attach() is blocking on something.

If so, busy-looping blindly probably isn't a good idea and we would
want at least msleep between retries (e.g. have two lists, throw
failed ones to the other and sleep shortly when switching the front
and back lists).

 + /*
 +  * This is a safety check because mem_cgroup_force_empty_list
 +  * could have raced with mem_cgroup_replace_page_cache callers
 +  * so the lru seemed empty but the page could have been added
 +  * right after the check. RES_USAGE should be safe as we always
 +  * charge before adding to the LRU.
 +  */
 + } while (res_counter_read_u64(memcg-res, RES_USAGE)  0);

Maybe we want to trigger some warning if retry count gets too high?
At least for now?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling

2012-10-17 Thread Michal Hocko
mem_cgroup_force_empty_list currently tries to remove all pages from
the given LRU. To prevent from temoporary failures (EBUSY returned by
mem_cgroup_move_parent) it uses a margin to the current LRU pages and
returns the true if there are still some pages left on the list.

If we consider that mem_cgroup_move_parent fails only when we are racing
with somebody else removing the page (resp. uncharging it) or when the
page is migrated then it is obvious that all those failures are only
temporal and so we can safely retry later.
Let's get rid of the safety margin and make the loop really wait for the
empty LRU. The caller should still make sure that all charges have been
removed from the res_counter because mem_cgroup_replace_page_cache might
add a page to the LRU after the check (it doesn't touch res_counter
though).
This catches most of the cases except for shmem which might call
mem_cgroup_replace_page_cache with a page which is not charged and on
the LRU yet but this was the case also without this patch. In order to
fix this we need a guarantee that try_get_mem_cgroup_from_page falls
back to the current mm's cgroup so it needs css_tryget to fail. This
will be fixed up in a later patch because it nees a help from cgroup
core.

Signed-off-by: Michal Hocko 
---
 mm/memcontrol.c |   52 +++-
 1 file changed, 27 insertions(+), 25 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9ce24b7..f57ba4c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2697,9 +2697,13 @@ out:
 }
 
 /*
- * move charges to its parent.
+ * move charges to its parent or the root cgroup if the group
+ * has no parent (aka use_hierarchy==0).
+ * Although this might fail the failure is always temporary and it
+ * signals a race with a page removal/uncharge or migration. In the
+ * first case the page will vanish from the LRU on the next attempt
+ * and the call should be retried later.
  */
-
 static int mem_cgroup_move_parent(struct page *page,
  struct page_cgroup *pc,
  struct mem_cgroup *child)
@@ -2726,8 +2730,10 @@ static int mem_cgroup_move_parent(struct page *page,
if (!parent)
parent = root_mem_cgroup;
 
-   if (nr_pages > 1)
+   if (nr_pages > 1) {
+   VM_BUG_ON(!PageTransHuge(page));
flags = compound_lock_irqsave(page);
+   }
 
ret = mem_cgroup_move_account(page, nr_pages,
pc, child, parent);
@@ -3679,15 +3685,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone 
*zone, int order,
 
 /*
  * Traverse a specified page_cgroup list and try to drop them all.  This 
doesn't
- * reclaim the pages page themselves - it just removes the page_cgroups.
- * Returns true if some page_cgroups were not freed, indicating that the caller
- * must retry this operation.
+ * reclaim the pages page themselves - pages are moved to the parent (or root)
+ * group.
  */
-static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
+static void mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
int node, int zid, enum lru_list lru)
 {
struct mem_cgroup_per_zone *mz;
-   unsigned long flags, loop;
+   unsigned long flags;
struct list_head *list;
struct page *busy;
struct zone *zone;
@@ -3696,11 +3701,8 @@ static bool mem_cgroup_force_empty_list(struct 
mem_cgroup *memcg,
mz = mem_cgroup_zoneinfo(memcg, node, zid);
list = >lruvec.lists[lru];
 
-   loop = mz->lru_size[lru];
-   /* give some margin against EBUSY etc...*/
-   loop += 256;
busy = NULL;
-   while (loop--) {
+   do {
struct page_cgroup *pc;
struct page *page;
 
@@ -3726,8 +3728,7 @@ static bool mem_cgroup_force_empty_list(struct mem_cgroup 
*memcg,
cond_resched();
} else
busy = NULL;
-   }
-   return !list_empty(list);
+   } while (!list_empty(list));
 }
 
 /*
@@ -3741,7 +3742,6 @@ static int mem_cgroup_reparent_charges(struct mem_cgroup 
*memcg)
 {
struct cgroup *cgrp = memcg->css.cgroup;
int node, zid;
-   int ret;
 
do {
if (cgroup_task_count(cgrp) || !list_empty(>children))
@@ -3749,28 +3749,30 @@ static int mem_cgroup_reparent_charges(struct 
mem_cgroup *memcg)
/* This is for making all *used* pages to be on LRU. */
lru_add_drain_all();
drain_all_stock_sync(memcg);
-   ret = 0;
mem_cgroup_start_move(memcg);
for_each_node_state(node, N_HIGH_MEMORY) {
-   for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
+   for (zid = 0; zid < MAX_NR_ZONES; zid++) {
enum lru_list lru;
for_each_lru(lru) {
- 

[PATCH 3/6] memcg: Simplify mem_cgroup_force_empty_list error handling

2012-10-17 Thread Michal Hocko
mem_cgroup_force_empty_list currently tries to remove all pages from
the given LRU. To prevent from temoporary failures (EBUSY returned by
mem_cgroup_move_parent) it uses a margin to the current LRU pages and
returns the true if there are still some pages left on the list.

If we consider that mem_cgroup_move_parent fails only when we are racing
with somebody else removing the page (resp. uncharging it) or when the
page is migrated then it is obvious that all those failures are only
temporal and so we can safely retry later.
Let's get rid of the safety margin and make the loop really wait for the
empty LRU. The caller should still make sure that all charges have been
removed from the res_counter because mem_cgroup_replace_page_cache might
add a page to the LRU after the check (it doesn't touch res_counter
though).
This catches most of the cases except for shmem which might call
mem_cgroup_replace_page_cache with a page which is not charged and on
the LRU yet but this was the case also without this patch. In order to
fix this we need a guarantee that try_get_mem_cgroup_from_page falls
back to the current mm's cgroup so it needs css_tryget to fail. This
will be fixed up in a later patch because it nees a help from cgroup
core.

Signed-off-by: Michal Hocko mho...@suse.cz
---
 mm/memcontrol.c |   52 +++-
 1 file changed, 27 insertions(+), 25 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9ce24b7..f57ba4c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2697,9 +2697,13 @@ out:
 }
 
 /*
- * move charges to its parent.
+ * move charges to its parent or the root cgroup if the group
+ * has no parent (aka use_hierarchy==0).
+ * Although this might fail the failure is always temporary and it
+ * signals a race with a page removal/uncharge or migration. In the
+ * first case the page will vanish from the LRU on the next attempt
+ * and the call should be retried later.
  */
-
 static int mem_cgroup_move_parent(struct page *page,
  struct page_cgroup *pc,
  struct mem_cgroup *child)
@@ -2726,8 +2730,10 @@ static int mem_cgroup_move_parent(struct page *page,
if (!parent)
parent = root_mem_cgroup;
 
-   if (nr_pages  1)
+   if (nr_pages  1) {
+   VM_BUG_ON(!PageTransHuge(page));
flags = compound_lock_irqsave(page);
+   }
 
ret = mem_cgroup_move_account(page, nr_pages,
pc, child, parent);
@@ -3679,15 +3685,14 @@ unsigned long mem_cgroup_soft_limit_reclaim(struct zone 
*zone, int order,
 
 /*
  * Traverse a specified page_cgroup list and try to drop them all.  This 
doesn't
- * reclaim the pages page themselves - it just removes the page_cgroups.
- * Returns true if some page_cgroups were not freed, indicating that the caller
- * must retry this operation.
+ * reclaim the pages page themselves - pages are moved to the parent (or root)
+ * group.
  */
-static bool mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
+static void mem_cgroup_force_empty_list(struct mem_cgroup *memcg,
int node, int zid, enum lru_list lru)
 {
struct mem_cgroup_per_zone *mz;
-   unsigned long flags, loop;
+   unsigned long flags;
struct list_head *list;
struct page *busy;
struct zone *zone;
@@ -3696,11 +3701,8 @@ static bool mem_cgroup_force_empty_list(struct 
mem_cgroup *memcg,
mz = mem_cgroup_zoneinfo(memcg, node, zid);
list = mz-lruvec.lists[lru];
 
-   loop = mz-lru_size[lru];
-   /* give some margin against EBUSY etc...*/
-   loop += 256;
busy = NULL;
-   while (loop--) {
+   do {
struct page_cgroup *pc;
struct page *page;
 
@@ -3726,8 +3728,7 @@ static bool mem_cgroup_force_empty_list(struct mem_cgroup 
*memcg,
cond_resched();
} else
busy = NULL;
-   }
-   return !list_empty(list);
+   } while (!list_empty(list));
 }
 
 /*
@@ -3741,7 +3742,6 @@ static int mem_cgroup_reparent_charges(struct mem_cgroup 
*memcg)
 {
struct cgroup *cgrp = memcg-css.cgroup;
int node, zid;
-   int ret;
 
do {
if (cgroup_task_count(cgrp) || !list_empty(cgrp-children))
@@ -3749,28 +3749,30 @@ static int mem_cgroup_reparent_charges(struct 
mem_cgroup *memcg)
/* This is for making all *used* pages to be on LRU. */
lru_add_drain_all();
drain_all_stock_sync(memcg);
-   ret = 0;
mem_cgroup_start_move(memcg);
for_each_node_state(node, N_HIGH_MEMORY) {
-   for (zid = 0; !ret  zid  MAX_NR_ZONES; zid++) {
+   for (zid = 0; zid  MAX_NR_ZONES; zid++) {
enum lru_list lru;