Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-23 Thread Hugh Dickins
On Thu, 23 Jan 2014, Michal Hocko wrote:
> On Thu 23-01-14 02:42:58, Hugh Dickins wrote:
> > > 
> > > Actually both patches are needed. If we had only 2/2 then we wouldn't
> > > endless loop inside mem_cgroup_iter but we could still return root to
> > > caller all the time because mem_cgroup_iter_load would return NULL on
> > > css_tryget failure on the cached root. Or am I missing something that
> > > would prevent that?
> > 
> > In theory I agree with you; and if you're missing something, I can't see
> > it either.  But in practice, all my earlier testing of 3.12 and 3.13 was
> > just with 2/2, and I've tried without your 1/2 since: whereas I have hung
> > on 3.12 and 3.13 a convincing number of times without 2/2, I have never
> > hung on them with 2/2 without 1/2.  In practice 1/2 appears essential
> > for 3.10 and 3.11, but irrelevant for 3.12 and 3.13.
> > 
> > That could be easy to explain if there were a difference at the calling
> > end, shrink_zone(), between those releases: but I don't see that.  Odd.
> > Either we're both missing something, or my testing is even less reliable
> > than I'd thought.  But since I certainly don't dispute 1/2, it is merely
> > academic.  Though still bothersome.
> 
> I would assume that it is (sc->nr_reclaimed >= sc->nr_to_reclaim) that
> helps us to back off. SWAP_CLUSTER_MAX shouldn't be that hard to get to
> before css_offline racing part reparents all the memory.

But wouldn't explain why I could see it on 3.10,11 but not on 3.12,13.

Perhaps the 2/2 problem is a lot easier to hit than the 1/2 problem,
and I mistakenly expected to see the 1/2 problem in the timescale that
I saw the 2/2 problem; but I don't really think either is the case.

> 
> Anyway, I would feel safer if this was pushed fixed although you haven't
> reporoduced it.

Absolutely.

> > Before Andrew sends these all off to Linus, I should admit that there's
> > probably a refinement still to come to the CSS_ONLINE one.  I'm ashamed
> > to admit that I overlooked a much earlier comment from Greg Thelen, who
> > suggested that a memory barrier might be required.
> 
> I was thinking about mem barrier while reviewing your patch but then I
> convinced myself that we should be safe also without using one when
> checking CSS_ONLINE.
> We have basically two situations.
>   - online_css when we can miss it being set which is OK because
> we would miss a new empty group.
>   - offline_css when we could still see the flag being set but
> then css_tryget would be already failing.
> 
> So while all this is subtle and relies on cgroup core heavily I think we
> should be safe wrt. memory barriers.
> 
> Or did you mean something else here?

Something else.  My CSS_OFFLINE patch claims to prevent the iterator
from returning an uninitialized struct mem_cgroup: if that is to be
relied upon, then it needs to make sure that the initialization of
the mem_cgroup is visible to the caller before the CSS_OFFLINE flag.

kernel/cgroup.c online_css() does nowadays have an smp_wmb() buried
in its rcu_assign_pointer(); but it's not in the right place to
make this particular guarantee.  And an smp_rmb() needed somewhere
too, if it doesn't already come for free somehow.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-23 Thread Michal Hocko
On Thu 23-01-14 02:42:58, Hugh Dickins wrote:
> On Wed, 22 Jan 2014, Michal Hocko wrote:
> > On Tue 21-01-14 13:18:42, Hugh Dickins wrote:
> > [...]
> > > We do have a confusing situation.  The hang goes back to 3.10 but takes
> > > two different forms, because of intervening changes: in 3.10 and 3.11
> > > mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
> > > 3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
> > > and cannot return to its caller.
> > > 
> > > Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
> > > to 3.11, but will have to be rediffed for 3.10 because of rearrangement
> > > in between). 
> > 
> > I will backport it when it reaches stable queue.
> 
> Thank you.
> 
> Testing, at home and at work, has confirmed your two patches are good.

Great. Thanks a lot!

> Greg's particular test on 3.11 gave convincing results, and I was helped
> by Linus's tree of last night (df32e43a54d0) happening to be quite quick
> to reproduce the issue on my laptop.
> 
> > 
> > > Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
> > > correctly to neither of them because it's diffed on top of my CSS_ONLINE
> > > fix).  Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
> > > whether Michal is claiming that it would also fix the hang in 3.12 and
> > > 3.13 if we didn't have 2/2: I doubt that, and haven't tested that.
> > 
> > Actually both patches are needed. If we had only 2/2 then we wouldn't
> > endless loop inside mem_cgroup_iter but we could still return root to
> > caller all the time because mem_cgroup_iter_load would return NULL on
> > css_tryget failure on the cached root. Or am I missing something that
> > would prevent that?
> 
> In theory I agree with you; and if you're missing something, I can't see
> it either.  But in practice, all my earlier testing of 3.12 and 3.13 was
> just with 2/2, and I've tried without your 1/2 since: whereas I have hung
> on 3.12 and 3.13 a convincing number of times without 2/2, I have never
> hung on them with 2/2 without 1/2.  In practice 1/2 appears essential
> for 3.10 and 3.11, but irrelevant for 3.12 and 3.13.
> 
> That could be easy to explain if there were a difference at the calling
> end, shrink_zone(), between those releases: but I don't see that.  Odd.
> Either we're both missing something, or my testing is even less reliable
> than I'd thought.  But since I certainly don't dispute 1/2, it is merely
> academic.  Though still bothersome.

I would assume that it is (sc->nr_reclaimed >= sc->nr_to_reclaim) that
helps us to back off. SWAP_CLUSTER_MAX shouldn't be that hard to get to
before css_offline racing part reparents all the memory.

Anyway, I would feel safer if this was pushed fixed although you haven't
reporoduced it.
 
> > > Given how Michal has diffed this patch on top of my CSS_ONLINE one
> > > (mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
> > > it would be helpful if you could mark that one also for stable 3.12+,
> > > to save us from having to rediff this one for stable.  We don't have
> > > a concrete example of a problem it solves in the vanilla kernel, but
> > > it makes more sense to include it than to exclude it.
> > 
> > Yes, I think it makes sense to queue it for 3.12+ as well because it is
> > non intrusive and potential issues would be really subtle.
> 
> Before Andrew sends these all off to Linus, I should admit that there's
> probably a refinement still to come to the CSS_ONLINE one.  I'm ashamed
> to admit that I overlooked a much earlier comment from Greg Thelen, who
> suggested that a memory barrier might be required.

I was thinking about mem barrier while reviewing your patch but then I
convinced myself that we should be safe also without using one when
checking CSS_ONLINE.
We have basically two situations.
- online_css when we can miss it being set which is OK because
  we would miss a new empty group.
- offline_css when we could still see the flag being set but
  then css_tryget would be already failing.

So while all this is subtle and relies on cgroup core heavily I think we
should be safe wrt. memory barriers.

Or did you mean something else here?

> I think he's right, and I'd have liked to say exactly what and where
> before answering you now; but barriers are tricky elusive things, I've
> not yet decided, and better report back to you now on the testing
> result.
> 
> Hugh

Thanks!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-23 Thread Hugh Dickins
On Wed, 22 Jan 2014, Michal Hocko wrote:
> On Tue 21-01-14 13:18:42, Hugh Dickins wrote:
> [...]
> > We do have a confusing situation.  The hang goes back to 3.10 but takes
> > two different forms, because of intervening changes: in 3.10 and 3.11
> > mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
> > 3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
> > and cannot return to its caller.
> > 
> > Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
> > to 3.11, but will have to be rediffed for 3.10 because of rearrangement
> > in between). 
> 
> I will backport it when it reaches stable queue.

Thank you.

Testing, at home and at work, has confirmed your two patches are good.
Greg's particular test on 3.11 gave convincing results, and I was helped
by Linus's tree of last night (df32e43a54d0) happening to be quite quick
to reproduce the issue on my laptop.

> 
> > Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
> > correctly to neither of them because it's diffed on top of my CSS_ONLINE
> > fix).  Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
> > whether Michal is claiming that it would also fix the hang in 3.12 and
> > 3.13 if we didn't have 2/2: I doubt that, and haven't tested that.
> 
> Actually both patches are needed. If we had only 2/2 then we wouldn't
> endless loop inside mem_cgroup_iter but we could still return root to
> caller all the time because mem_cgroup_iter_load would return NULL on
> css_tryget failure on the cached root. Or am I missing something that
> would prevent that?

In theory I agree with you; and if you're missing something, I can't see
it either.  But in practice, all my earlier testing of 3.12 and 3.13 was
just with 2/2, and I've tried without your 1/2 since: whereas I have hung
on 3.12 and 3.13 a convincing number of times without 2/2, I have never
hung on them with 2/2 without 1/2.  In practice 1/2 appears essential
for 3.10 and 3.11, but irrelevant for 3.12 and 3.13.

That could be easy to explain if there were a difference at the calling
end, shrink_zone(), between those releases: but I don't see that.  Odd.
Either we're both missing something, or my testing is even less reliable
than I'd thought.  But since I certainly don't dispute 1/2, it is merely
academic.  Though still bothersome.

> 
> > Given how Michal has diffed this patch on top of my CSS_ONLINE one
> > (mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
> > it would be helpful if you could mark that one also for stable 3.12+,
> > to save us from having to rediff this one for stable.  We don't have
> > a concrete example of a problem it solves in the vanilla kernel, but
> > it makes more sense to include it than to exclude it.
> 
> Yes, I think it makes sense to queue it for 3.12+ as well because it is
> non intrusive and potential issues would be really subtle.

Before Andrew sends these all off to Linus, I should admit that there's
probably a refinement still to come to the CSS_ONLINE one.  I'm ashamed
to admit that I overlooked a much earlier comment from Greg Thelen, who
suggested that a memory barrier might be required.  I think he's right,
and I'd have liked to say exactly what and where before answering you
now; but barriers are tricky elusive things, I've not yet decided,
and better report back to you now on the testing result.

Hugh

> 
> > (You would be right to point out that the CSS_ONLINE one fixes
> > something that goes back to 3.10: I'm saying 3.12+ because I'm not
> > motivated to rediff it for 3.10 and 3.11 when there's nothing to
> > go on top; but that's not a very good reason to lie - overrule me.)
> > 
> > Hugh
> 
> -- 
> Michal Hocko
> SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-23 Thread Hugh Dickins
On Wed, 22 Jan 2014, Michal Hocko wrote:
 On Tue 21-01-14 13:18:42, Hugh Dickins wrote:
 [...]
  We do have a confusing situation.  The hang goes back to 3.10 but takes
  two different forms, because of intervening changes: in 3.10 and 3.11
  mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
  3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
  and cannot return to its caller.
  
  Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
  to 3.11, but will have to be rediffed for 3.10 because of rearrangement
  in between). 
 
 I will backport it when it reaches stable queue.

Thank you.

Testing, at home and at work, has confirmed your two patches are good.
Greg's particular test on 3.11 gave convincing results, and I was helped
by Linus's tree of last night (df32e43a54d0) happening to be quite quick
to reproduce the issue on my laptop.

 
  Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
  correctly to neither of them because it's diffed on top of my CSS_ONLINE
  fix).  Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
  whether Michal is claiming that it would also fix the hang in 3.12 and
  3.13 if we didn't have 2/2: I doubt that, and haven't tested that.
 
 Actually both patches are needed. If we had only 2/2 then we wouldn't
 endless loop inside mem_cgroup_iter but we could still return root to
 caller all the time because mem_cgroup_iter_load would return NULL on
 css_tryget failure on the cached root. Or am I missing something that
 would prevent that?

In theory I agree with you; and if you're missing something, I can't see
it either.  But in practice, all my earlier testing of 3.12 and 3.13 was
just with 2/2, and I've tried without your 1/2 since: whereas I have hung
on 3.12 and 3.13 a convincing number of times without 2/2, I have never
hung on them with 2/2 without 1/2.  In practice 1/2 appears essential
for 3.10 and 3.11, but irrelevant for 3.12 and 3.13.

That could be easy to explain if there were a difference at the calling
end, shrink_zone(), between those releases: but I don't see that.  Odd.
Either we're both missing something, or my testing is even less reliable
than I'd thought.  But since I certainly don't dispute 1/2, it is merely
academic.  Though still bothersome.

 
  Given how Michal has diffed this patch on top of my CSS_ONLINE one
  (mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
  it would be helpful if you could mark that one also for stable 3.12+,
  to save us from having to rediff this one for stable.  We don't have
  a concrete example of a problem it solves in the vanilla kernel, but
  it makes more sense to include it than to exclude it.
 
 Yes, I think it makes sense to queue it for 3.12+ as well because it is
 non intrusive and potential issues would be really subtle.

Before Andrew sends these all off to Linus, I should admit that there's
probably a refinement still to come to the CSS_ONLINE one.  I'm ashamed
to admit that I overlooked a much earlier comment from Greg Thelen, who
suggested that a memory barrier might be required.  I think he's right,
and I'd have liked to say exactly what and where before answering you
now; but barriers are tricky elusive things, I've not yet decided,
and better report back to you now on the testing result.

Hugh

 
  (You would be right to point out that the CSS_ONLINE one fixes
  something that goes back to 3.10: I'm saying 3.12+ because I'm not
  motivated to rediff it for 3.10 and 3.11 when there's nothing to
  go on top; but that's not a very good reason to lie - overrule me.)
  
  Hugh
 
 -- 
 Michal Hocko
 SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-23 Thread Michal Hocko
On Thu 23-01-14 02:42:58, Hugh Dickins wrote:
 On Wed, 22 Jan 2014, Michal Hocko wrote:
  On Tue 21-01-14 13:18:42, Hugh Dickins wrote:
  [...]
   We do have a confusing situation.  The hang goes back to 3.10 but takes
   two different forms, because of intervening changes: in 3.10 and 3.11
   mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
   3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
   and cannot return to its caller.
   
   Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
   to 3.11, but will have to be rediffed for 3.10 because of rearrangement
   in between). 
  
  I will backport it when it reaches stable queue.
 
 Thank you.
 
 Testing, at home and at work, has confirmed your two patches are good.

Great. Thanks a lot!

 Greg's particular test on 3.11 gave convincing results, and I was helped
 by Linus's tree of last night (df32e43a54d0) happening to be quite quick
 to reproduce the issue on my laptop.
 
  
   Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
   correctly to neither of them because it's diffed on top of my CSS_ONLINE
   fix).  Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
   whether Michal is claiming that it would also fix the hang in 3.12 and
   3.13 if we didn't have 2/2: I doubt that, and haven't tested that.
  
  Actually both patches are needed. If we had only 2/2 then we wouldn't
  endless loop inside mem_cgroup_iter but we could still return root to
  caller all the time because mem_cgroup_iter_load would return NULL on
  css_tryget failure on the cached root. Or am I missing something that
  would prevent that?
 
 In theory I agree with you; and if you're missing something, I can't see
 it either.  But in practice, all my earlier testing of 3.12 and 3.13 was
 just with 2/2, and I've tried without your 1/2 since: whereas I have hung
 on 3.12 and 3.13 a convincing number of times without 2/2, I have never
 hung on them with 2/2 without 1/2.  In practice 1/2 appears essential
 for 3.10 and 3.11, but irrelevant for 3.12 and 3.13.
 
 That could be easy to explain if there were a difference at the calling
 end, shrink_zone(), between those releases: but I don't see that.  Odd.
 Either we're both missing something, or my testing is even less reliable
 than I'd thought.  But since I certainly don't dispute 1/2, it is merely
 academic.  Though still bothersome.

I would assume that it is (sc-nr_reclaimed = sc-nr_to_reclaim) that
helps us to back off. SWAP_CLUSTER_MAX shouldn't be that hard to get to
before css_offline racing part reparents all the memory.

Anyway, I would feel safer if this was pushed fixed although you haven't
reporoduced it.
 
   Given how Michal has diffed this patch on top of my CSS_ONLINE one
   (mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
   it would be helpful if you could mark that one also for stable 3.12+,
   to save us from having to rediff this one for stable.  We don't have
   a concrete example of a problem it solves in the vanilla kernel, but
   it makes more sense to include it than to exclude it.
  
  Yes, I think it makes sense to queue it for 3.12+ as well because it is
  non intrusive and potential issues would be really subtle.
 
 Before Andrew sends these all off to Linus, I should admit that there's
 probably a refinement still to come to the CSS_ONLINE one.  I'm ashamed
 to admit that I overlooked a much earlier comment from Greg Thelen, who
 suggested that a memory barrier might be required.

I was thinking about mem barrier while reviewing your patch but then I
convinced myself that we should be safe also without using one when
checking CSS_ONLINE.
We have basically two situations.
- online_css when we can miss it being set which is OK because
  we would miss a new empty group.
- offline_css when we could still see the flag being set but
  then css_tryget would be already failing.

So while all this is subtle and relies on cgroup core heavily I think we
should be safe wrt. memory barriers.

Or did you mean something else here?

 I think he's right, and I'd have liked to say exactly what and where
 before answering you now; but barriers are tricky elusive things, I've
 not yet decided, and better report back to you now on the testing
 result.
 
 Hugh

Thanks!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-23 Thread Hugh Dickins
On Thu, 23 Jan 2014, Michal Hocko wrote:
 On Thu 23-01-14 02:42:58, Hugh Dickins wrote:
   
   Actually both patches are needed. If we had only 2/2 then we wouldn't
   endless loop inside mem_cgroup_iter but we could still return root to
   caller all the time because mem_cgroup_iter_load would return NULL on
   css_tryget failure on the cached root. Or am I missing something that
   would prevent that?
  
  In theory I agree with you; and if you're missing something, I can't see
  it either.  But in practice, all my earlier testing of 3.12 and 3.13 was
  just with 2/2, and I've tried without your 1/2 since: whereas I have hung
  on 3.12 and 3.13 a convincing number of times without 2/2, I have never
  hung on them with 2/2 without 1/2.  In practice 1/2 appears essential
  for 3.10 and 3.11, but irrelevant for 3.12 and 3.13.
  
  That could be easy to explain if there were a difference at the calling
  end, shrink_zone(), between those releases: but I don't see that.  Odd.
  Either we're both missing something, or my testing is even less reliable
  than I'd thought.  But since I certainly don't dispute 1/2, it is merely
  academic.  Though still bothersome.
 
 I would assume that it is (sc-nr_reclaimed = sc-nr_to_reclaim) that
 helps us to back off. SWAP_CLUSTER_MAX shouldn't be that hard to get to
 before css_offline racing part reparents all the memory.

But wouldn't explain why I could see it on 3.10,11 but not on 3.12,13.

Perhaps the 2/2 problem is a lot easier to hit than the 1/2 problem,
and I mistakenly expected to see the 1/2 problem in the timescale that
I saw the 2/2 problem; but I don't really think either is the case.

 
 Anyway, I would feel safer if this was pushed fixed although you haven't
 reporoduced it.

Absolutely.

  Before Andrew sends these all off to Linus, I should admit that there's
  probably a refinement still to come to the CSS_ONLINE one.  I'm ashamed
  to admit that I overlooked a much earlier comment from Greg Thelen, who
  suggested that a memory barrier might be required.
 
 I was thinking about mem barrier while reviewing your patch but then I
 convinced myself that we should be safe also without using one when
 checking CSS_ONLINE.
 We have basically two situations.
   - online_css when we can miss it being set which is OK because
 we would miss a new empty group.
   - offline_css when we could still see the flag being set but
 then css_tryget would be already failing.
 
 So while all this is subtle and relies on cgroup core heavily I think we
 should be safe wrt. memory barriers.
 
 Or did you mean something else here?

Something else.  My CSS_OFFLINE patch claims to prevent the iterator
from returning an uninitialized struct mem_cgroup: if that is to be
relied upon, then it needs to make sure that the initialization of
the mem_cgroup is visible to the caller before the CSS_OFFLINE flag.

kernel/cgroup.c online_css() does nowadays have an smp_wmb() buried
in its rcu_assign_pointer(); but it's not in the right place to
make this particular guarantee.  And an smp_rmb() needed somewhere
too, if it doesn't already come for free somehow.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-22 Thread Michal Hocko
On Tue 21-01-14 13:18:42, Hugh Dickins wrote:
[...]
> We do have a confusing situation.  The hang goes back to 3.10 but takes
> two different forms, because of intervening changes: in 3.10 and 3.11
> mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
> 3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
> and cannot return to its caller.
> 
> Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
> to 3.11, but will have to be rediffed for 3.10 because of rearrangement
> in between). 

I will backport it when it reaches stable queue.

> Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
> correctly to neither of them because it's diffed on top of my CSS_ONLINE
> fix).  Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
> whether Michal is claiming that it would also fix the hang in 3.12 and
> 3.13 if we didn't have 2/2: I doubt that, and haven't tested that.

Actually both patches are needed. If we had only 2/2 then we wouldn't
endless loop inside mem_cgroup_iter but we could still return root to
caller all the time because mem_cgroup_iter_load would return NULL on
css_tryget failure on the cached root. Or am I missing something that
would prevent that?

> Given how Michal has diffed this patch on top of my CSS_ONLINE one
> (mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
> it would be helpful if you could mark that one also for stable 3.12+,
> to save us from having to rediff this one for stable.  We don't have
> a concrete example of a problem it solves in the vanilla kernel, but
> it makes more sense to include it than to exclude it.

Yes, I think it makes sense to queue it for 3.12+ as well because it is
non intrusive and potential issues would be really subtle.

> (You would be right to point out that the CSS_ONLINE one fixes
> something that goes back to 3.10: I'm saying 3.12+ because I'm not
> motivated to rediff it for 3.10 and 3.11 when there's nothing to
> go on top; but that's not a very good reason to lie - overrule me.)
> 
> Hugh

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-22 Thread Michal Hocko
On Tue 21-01-14 11:42:19, Andrew Morton wrote:
> On Tue, 21 Jan 2014 11:45:43 +0100 Michal Hocko  wrote:
> 
> > 19f39402864e (memcg: simplify mem_cgroup_iter) has reorganized
> > mem_cgroup_iter code in order to simplify it. A part of that change was
> > dropping an optimization which didn't call css_tryget on the root of
> > the walked tree. The patch however didn't change the css_put part in
> > mem_cgroup_iter which excludes root.
> > This wasn't an issue at the time because __mem_cgroup_iter_next bailed
> > out for root early without taking a reference as cgroup iterators
> > (css_next_descendant_pre) didn't visit root themselves.
> > 
> > Nevertheless cgroup iterators have been reworked to visit root by
> > bd8815a6d802 (cgroup: make css_for_each_descendant() and friends include
> > the origin css in the iteration) when the root bypass have been dropped
> > in __mem_cgroup_iter_next. This means that css_put is not called for
> > root and so css along with mem_cgroup and other cgroup internal object
> > tied by css lifetime are never freed.
> > 
> > Fix the issue by reintroducing root check in __mem_cgroup_iter_next
> > and do not take css reference for it.
> > 
> > This reference counting magic protects us also from another issue, an
> > endless loop reported by Hugh Dickins when reclaim races with root
> > removal and css_tryget called by iterator internally would fail. There
> > would be no other nodes to visit so __mem_cgroup_iter_next would return
> > NULL and mem_cgroup_iter would interpret it as "start looping from root
> > again" and so mem_cgroup_iter would loop forever internally.
> 
> I grabbed these two patches but I will sit on them for a week or so,
> pending review-n-test.

Yes, there is no rush and this needs a proper review.

> > Cc: sta...@vger.kernel.org # mem_leak part 3.12+
> 
> What does this mean?

Dohh. I had both patches in one but then I decided to split it. This is
just left over. Should be 3.12+.

Sorry about the confusion.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-22 Thread Michal Hocko
On Tue 21-01-14 11:42:19, Andrew Morton wrote:
 On Tue, 21 Jan 2014 11:45:43 +0100 Michal Hocko mho...@suse.cz wrote:
 
  19f39402864e (memcg: simplify mem_cgroup_iter) has reorganized
  mem_cgroup_iter code in order to simplify it. A part of that change was
  dropping an optimization which didn't call css_tryget on the root of
  the walked tree. The patch however didn't change the css_put part in
  mem_cgroup_iter which excludes root.
  This wasn't an issue at the time because __mem_cgroup_iter_next bailed
  out for root early without taking a reference as cgroup iterators
  (css_next_descendant_pre) didn't visit root themselves.
  
  Nevertheless cgroup iterators have been reworked to visit root by
  bd8815a6d802 (cgroup: make css_for_each_descendant() and friends include
  the origin css in the iteration) when the root bypass have been dropped
  in __mem_cgroup_iter_next. This means that css_put is not called for
  root and so css along with mem_cgroup and other cgroup internal object
  tied by css lifetime are never freed.
  
  Fix the issue by reintroducing root check in __mem_cgroup_iter_next
  and do not take css reference for it.
  
  This reference counting magic protects us also from another issue, an
  endless loop reported by Hugh Dickins when reclaim races with root
  removal and css_tryget called by iterator internally would fail. There
  would be no other nodes to visit so __mem_cgroup_iter_next would return
  NULL and mem_cgroup_iter would interpret it as start looping from root
  again and so mem_cgroup_iter would loop forever internally.
 
 I grabbed these two patches but I will sit on them for a week or so,
 pending review-n-test.

Yes, there is no rush and this needs a proper review.

  Cc: sta...@vger.kernel.org # mem_leak part 3.12+
 
 What does this mean?

Dohh. I had both patches in one but then I decided to split it. This is
just left over. Should be 3.12+.

Sorry about the confusion.

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-22 Thread Michal Hocko
On Tue 21-01-14 13:18:42, Hugh Dickins wrote:
[...]
 We do have a confusing situation.  The hang goes back to 3.10 but takes
 two different forms, because of intervening changes: in 3.10 and 3.11
 mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
 3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
 and cannot return to its caller.
 
 Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
 to 3.11, but will have to be rediffed for 3.10 because of rearrangement
 in between). 

I will backport it when it reaches stable queue.

 Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
 correctly to neither of them because it's diffed on top of my CSS_ONLINE
 fix).  Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
 whether Michal is claiming that it would also fix the hang in 3.12 and
 3.13 if we didn't have 2/2: I doubt that, and haven't tested that.

Actually both patches are needed. If we had only 2/2 then we wouldn't
endless loop inside mem_cgroup_iter but we could still return root to
caller all the time because mem_cgroup_iter_load would return NULL on
css_tryget failure on the cached root. Or am I missing something that
would prevent that?

 Given how Michal has diffed this patch on top of my CSS_ONLINE one
 (mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
 it would be helpful if you could mark that one also for stable 3.12+,
 to save us from having to rediff this one for stable.  We don't have
 a concrete example of a problem it solves in the vanilla kernel, but
 it makes more sense to include it than to exclude it.

Yes, I think it makes sense to queue it for 3.12+ as well because it is
non intrusive and potential issues would be really subtle.

 (You would be right to point out that the CSS_ONLINE one fixes
 something that goes back to 3.10: I'm saying 3.12+ because I'm not
 motivated to rediff it for 3.10 and 3.11 when there's nothing to
 go on top; but that's not a very good reason to lie - overrule me.)
 
 Hugh

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-21 Thread Hugh Dickins
On Tue, 21 Jan 2014, Andrew Morton wrote:
> On Tue, 21 Jan 2014 11:45:43 +0100 Michal Hocko  wrote:
> 
> > 19f39402864e (memcg: simplify mem_cgroup_iter) has reorganized
> > mem_cgroup_iter code in order to simplify it. A part of that change was
> > dropping an optimization which didn't call css_tryget on the root of
> > the walked tree. The patch however didn't change the css_put part in
> > mem_cgroup_iter which excludes root.
> > This wasn't an issue at the time because __mem_cgroup_iter_next bailed
> > out for root early without taking a reference as cgroup iterators
> > (css_next_descendant_pre) didn't visit root themselves.
> > 
> > Nevertheless cgroup iterators have been reworked to visit root by
> > bd8815a6d802 (cgroup: make css_for_each_descendant() and friends include
> > the origin css in the iteration) when the root bypass have been dropped
> > in __mem_cgroup_iter_next. This means that css_put is not called for
> > root and so css along with mem_cgroup and other cgroup internal object
> > tied by css lifetime are never freed.
> > 
> > Fix the issue by reintroducing root check in __mem_cgroup_iter_next
> > and do not take css reference for it.
> > 
> > This reference counting magic protects us also from another issue, an
> > endless loop reported by Hugh Dickins when reclaim races with root
> > removal and css_tryget called by iterator internally would fail. There
> > would be no other nodes to visit so __mem_cgroup_iter_next would return
> > NULL and mem_cgroup_iter would interpret it as "start looping from root
> > again" and so mem_cgroup_iter would loop forever internally.
> 
> I grabbed these two patches but I will sit on them for a week or so,
> pending review-n-test.

Thank you, yes, I'm about to give them more testing.

> 
> > Cc: sta...@vger.kernel.org # mem_leak part 3.12+
> 
> What does this mean?

It's certainly a confusing comment.

I suggest just deleting the "mem_leak part ": Michal isn't referring to
any two parts of the patch itself, but to parts of his commit comment;
but it's still unclear what he's claiming.

We do have a confusing situation.  The hang goes back to 3.10 but takes
two different forms, because of intervening changes: in 3.10 and 3.11
mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
and cannot return to its caller.

Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
to 3.11, but will have to be rediffed for 3.10 because of rearrangement
in between).  Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
correctly to neither of them because it's diffed on top of my CSS_ONLINE
fix).  Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
whether Michal is claiming that it would also fix the hang in 3.12 and
3.13 if we didn't have 2/2: I doubt that, and haven't tested that.

Given how Michal has diffed this patch on top of my CSS_ONLINE one
(mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
it would be helpful if you could mark that one also for stable 3.12+,
to save us from having to rediff this one for stable.  We don't have
a concrete example of a problem it solves in the vanilla kernel, but
it makes more sense to include it than to exclude it.

(You would be right to point out that the CSS_ONLINE one fixes
something that goes back to 3.10: I'm saying 3.12+ because I'm not
motivated to rediff it for 3.10 and 3.11 when there's nothing to
go on top; but that's not a very good reason to lie - overrule me.)

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-21 Thread Andrew Morton
On Tue, 21 Jan 2014 11:45:43 +0100 Michal Hocko  wrote:

> 19f39402864e (memcg: simplify mem_cgroup_iter) has reorganized
> mem_cgroup_iter code in order to simplify it. A part of that change was
> dropping an optimization which didn't call css_tryget on the root of
> the walked tree. The patch however didn't change the css_put part in
> mem_cgroup_iter which excludes root.
> This wasn't an issue at the time because __mem_cgroup_iter_next bailed
> out for root early without taking a reference as cgroup iterators
> (css_next_descendant_pre) didn't visit root themselves.
> 
> Nevertheless cgroup iterators have been reworked to visit root by
> bd8815a6d802 (cgroup: make css_for_each_descendant() and friends include
> the origin css in the iteration) when the root bypass have been dropped
> in __mem_cgroup_iter_next. This means that css_put is not called for
> root and so css along with mem_cgroup and other cgroup internal object
> tied by css lifetime are never freed.
> 
> Fix the issue by reintroducing root check in __mem_cgroup_iter_next
> and do not take css reference for it.
> 
> This reference counting magic protects us also from another issue, an
> endless loop reported by Hugh Dickins when reclaim races with root
> removal and css_tryget called by iterator internally would fail. There
> would be no other nodes to visit so __mem_cgroup_iter_next would return
> NULL and mem_cgroup_iter would interpret it as "start looping from root
> again" and so mem_cgroup_iter would loop forever internally.

I grabbed these two patches but I will sit on them for a week or so,
pending review-n-test.

> Cc: sta...@vger.kernel.org # mem_leak part 3.12+

What does this mean?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-21 Thread Michal Hocko
19f39402864e (memcg: simplify mem_cgroup_iter) has reorganized
mem_cgroup_iter code in order to simplify it. A part of that change was
dropping an optimization which didn't call css_tryget on the root of
the walked tree. The patch however didn't change the css_put part in
mem_cgroup_iter which excludes root.
This wasn't an issue at the time because __mem_cgroup_iter_next bailed
out for root early without taking a reference as cgroup iterators
(css_next_descendant_pre) didn't visit root themselves.

Nevertheless cgroup iterators have been reworked to visit root by
bd8815a6d802 (cgroup: make css_for_each_descendant() and friends include
the origin css in the iteration) when the root bypass have been dropped
in __mem_cgroup_iter_next. This means that css_put is not called for
root and so css along with mem_cgroup and other cgroup internal object
tied by css lifetime are never freed.

Fix the issue by reintroducing root check in __mem_cgroup_iter_next
and do not take css reference for it.

This reference counting magic protects us also from another issue, an
endless loop reported by Hugh Dickins when reclaim races with root
removal and css_tryget called by iterator internally would fail. There
would be no other nodes to visit so __mem_cgroup_iter_next would return
NULL and mem_cgroup_iter would interpret it as "start looping from root
again" and so mem_cgroup_iter would loop forever internally.

Cc: sta...@vger.kernel.org # mem_leak part 3.12+
Reported-by: Hugh Dickins 
Signed-off-by: Michal Hocko 
---
 mm/memcontrol.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 45786dc129dc..55bb1f8c6907 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1076,14 +1076,22 @@ skip_node:
 * skipped and we should continue the tree walk.
 * last_visited css is safe to use because it is
 * protected by css_get and the tree walk is rcu safe.
+*
+* We do not take a reference on the root of the tree walk
+* because we might race with the root removal when it would
+* be the only node in the iterated hierarchy and mem_cgroup_iter
+* would end up in an endless loop because it expects that at
+* least one valid node will be returned. Root cannot disappear
+* because caller of the iterator should hold it already so
+* skipping css reference should be safe.
 */
if (next_css) {
-   if ((next_css->flags & CSS_ONLINE) && css_tryget(next_css))
+   if ((next_css->flags & CSS_ONLINE) &&
+   (next_css == >css || 
css_tryget(next_css)))
return mem_cgroup_from_css(next_css);
-   else {
-   prev_css = next_css;
-   goto skip_node;
-   }
+
+   prev_css = next_css;
+   goto skip_node;
}
 
return NULL;
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-21 Thread Michal Hocko
19f39402864e (memcg: simplify mem_cgroup_iter) has reorganized
mem_cgroup_iter code in order to simplify it. A part of that change was
dropping an optimization which didn't call css_tryget on the root of
the walked tree. The patch however didn't change the css_put part in
mem_cgroup_iter which excludes root.
This wasn't an issue at the time because __mem_cgroup_iter_next bailed
out for root early without taking a reference as cgroup iterators
(css_next_descendant_pre) didn't visit root themselves.

Nevertheless cgroup iterators have been reworked to visit root by
bd8815a6d802 (cgroup: make css_for_each_descendant() and friends include
the origin css in the iteration) when the root bypass have been dropped
in __mem_cgroup_iter_next. This means that css_put is not called for
root and so css along with mem_cgroup and other cgroup internal object
tied by css lifetime are never freed.

Fix the issue by reintroducing root check in __mem_cgroup_iter_next
and do not take css reference for it.

This reference counting magic protects us also from another issue, an
endless loop reported by Hugh Dickins when reclaim races with root
removal and css_tryget called by iterator internally would fail. There
would be no other nodes to visit so __mem_cgroup_iter_next would return
NULL and mem_cgroup_iter would interpret it as start looping from root
again and so mem_cgroup_iter would loop forever internally.

Cc: sta...@vger.kernel.org # mem_leak part 3.12+
Reported-by: Hugh Dickins hu...@google.com
Signed-off-by: Michal Hocko mho...@suse.cz
---
 mm/memcontrol.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 45786dc129dc..55bb1f8c6907 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1076,14 +1076,22 @@ skip_node:
 * skipped and we should continue the tree walk.
 * last_visited css is safe to use because it is
 * protected by css_get and the tree walk is rcu safe.
+*
+* We do not take a reference on the root of the tree walk
+* because we might race with the root removal when it would
+* be the only node in the iterated hierarchy and mem_cgroup_iter
+* would end up in an endless loop because it expects that at
+* least one valid node will be returned. Root cannot disappear
+* because caller of the iterator should hold it already so
+* skipping css reference should be safe.
 */
if (next_css) {
-   if ((next_css-flags  CSS_ONLINE)  css_tryget(next_css))
+   if ((next_css-flags  CSS_ONLINE) 
+   (next_css == root-css || 
css_tryget(next_css)))
return mem_cgroup_from_css(next_css);
-   else {
-   prev_css = next_css;
-   goto skip_node;
-   }
+
+   prev_css = next_css;
+   goto skip_node;
}
 
return NULL;
-- 
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-21 Thread Andrew Morton
On Tue, 21 Jan 2014 11:45:43 +0100 Michal Hocko mho...@suse.cz wrote:

 19f39402864e (memcg: simplify mem_cgroup_iter) has reorganized
 mem_cgroup_iter code in order to simplify it. A part of that change was
 dropping an optimization which didn't call css_tryget on the root of
 the walked tree. The patch however didn't change the css_put part in
 mem_cgroup_iter which excludes root.
 This wasn't an issue at the time because __mem_cgroup_iter_next bailed
 out for root early without taking a reference as cgroup iterators
 (css_next_descendant_pre) didn't visit root themselves.
 
 Nevertheless cgroup iterators have been reworked to visit root by
 bd8815a6d802 (cgroup: make css_for_each_descendant() and friends include
 the origin css in the iteration) when the root bypass have been dropped
 in __mem_cgroup_iter_next. This means that css_put is not called for
 root and so css along with mem_cgroup and other cgroup internal object
 tied by css lifetime are never freed.
 
 Fix the issue by reintroducing root check in __mem_cgroup_iter_next
 and do not take css reference for it.
 
 This reference counting magic protects us also from another issue, an
 endless loop reported by Hugh Dickins when reclaim races with root
 removal and css_tryget called by iterator internally would fail. There
 would be no other nodes to visit so __mem_cgroup_iter_next would return
 NULL and mem_cgroup_iter would interpret it as start looping from root
 again and so mem_cgroup_iter would loop forever internally.

I grabbed these two patches but I will sit on them for a week or so,
pending review-n-test.

 Cc: sta...@vger.kernel.org # mem_leak part 3.12+

What does this mean?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loop in mem_cgroup_iter

2014-01-21 Thread Hugh Dickins
On Tue, 21 Jan 2014, Andrew Morton wrote:
 On Tue, 21 Jan 2014 11:45:43 +0100 Michal Hocko mho...@suse.cz wrote:
 
  19f39402864e (memcg: simplify mem_cgroup_iter) has reorganized
  mem_cgroup_iter code in order to simplify it. A part of that change was
  dropping an optimization which didn't call css_tryget on the root of
  the walked tree. The patch however didn't change the css_put part in
  mem_cgroup_iter which excludes root.
  This wasn't an issue at the time because __mem_cgroup_iter_next bailed
  out for root early without taking a reference as cgroup iterators
  (css_next_descendant_pre) didn't visit root themselves.
  
  Nevertheless cgroup iterators have been reworked to visit root by
  bd8815a6d802 (cgroup: make css_for_each_descendant() and friends include
  the origin css in the iteration) when the root bypass have been dropped
  in __mem_cgroup_iter_next. This means that css_put is not called for
  root and so css along with mem_cgroup and other cgroup internal object
  tied by css lifetime are never freed.
  
  Fix the issue by reintroducing root check in __mem_cgroup_iter_next
  and do not take css reference for it.
  
  This reference counting magic protects us also from another issue, an
  endless loop reported by Hugh Dickins when reclaim races with root
  removal and css_tryget called by iterator internally would fail. There
  would be no other nodes to visit so __mem_cgroup_iter_next would return
  NULL and mem_cgroup_iter would interpret it as start looping from root
  again and so mem_cgroup_iter would loop forever internally.
 
 I grabbed these two patches but I will sit on them for a week or so,
 pending review-n-test.

Thank you, yes, I'm about to give them more testing.

 
  Cc: sta...@vger.kernel.org # mem_leak part 3.12+
 
 What does this mean?

It's certainly a confusing comment.

I suggest just deleting the mem_leak part : Michal isn't referring to
any two parts of the patch itself, but to parts of his commit comment;
but it's still unclear what he's claiming.

We do have a confusing situation.  The hang goes back to 3.10 but takes
two different forms, because of intervening changes: in 3.10 and 3.11
mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
and cannot return to its caller.

Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
to 3.11, but will have to be rediffed for 3.10 because of rearrangement
in between).  Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
correctly to neither of them because it's diffed on top of my CSS_ONLINE
fix).  Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
whether Michal is claiming that it would also fix the hang in 3.12 and
3.13 if we didn't have 2/2: I doubt that, and haven't tested that.

Given how Michal has diffed this patch on top of my CSS_ONLINE one
(mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
it would be helpful if you could mark that one also for stable 3.12+,
to save us from having to rediff this one for stable.  We don't have
a concrete example of a problem it solves in the vanilla kernel, but
it makes more sense to include it than to exclude it.

(You would be right to point out that the CSS_ONLINE one fixes
something that goes back to 3.10: I'm saying 3.12+ because I'm not
motivated to rediff it for 3.10 and 3.11 when there's nothing to
go on top; but that's not a very good reason to lie - overrule me.)

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/