Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-11 Thread Tejun Heo
On Wed, Jun 11, 2014 at 04:11:17PM +0200, Michal Hocko wrote:
> > I still think it'd be less useful than "high", but as there seem to be
> > use cases which can be served with that and especially as a part of a
> > consistent control scheme, I have no objection.
> > 
> > "low" definitely requires a notification mechanism tho.
> 
> Would vmpressure notification be sufficient? That one is in place for
> any memcg which is reclaimed.

Yeah, as long as it can reliably notify userland that the soft
guarantee has been breached, it'd be great as it means we'd have a
single mechanism to monitor both "low" and "high" while "min" and
"max" are oom based, which BTW needs more work but that's a separate
piece of work.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-11 Thread Michal Hocko
On Wed 11-06-14 08:31:09, Tejun Heo wrote:
> Hello, Michal.
> 
> On Wed, Jun 11, 2014 at 09:57:29AM +0200, Michal Hocko wrote:
> > Is this the kind of symmetry Tejun is asking for and that would make
> > change is Nack position? I am still not sure it satisfies his soft
> 
> Yes, pretty much.  What primarily bothered me was the soft/hard
> guarantees being chosen by a toggle switch while the soft/hard limits
> can be configured separately and combined.

The last consensus at LSF was that there would be a knob which will
distinguish hard/best effort behavior. The weaker semantic has strong
usecases IMHO so I wanted to start with it and add a knob for the hard
guarantee later when explicitly asked for.

Going with min, low, high and hard makes more sense to me of course.

> > guarantee objections from other email.
> 
> I was wondering about the usefulness of "low" itself in isolation and

I think it has more usecases than "min" from simply practical POV. OOM
means a potential service down time and that is a no go. Optimistic
isolation on the other hand adds an advantages of the isolation most of
the time while not getting completely flat on an exception (be it
misconfiguration or a corner case like mentioned during the discussion).

That doesn't mean "min" is not useful. It definitely is, the category
of usecases will be more specific though.

> I still think it'd be less useful than "high", but as there seem to be
> use cases which can be served with that and especially as a part of a
> consistent control scheme, I have no objection.
> 
> "low" definitely requires a notification mechanism tho.

Would vmpressure notification be sufficient? That one is in place for
any memcg which is reclaimed.

Or are you thinking about something more like oom_control?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-11 Thread Tejun Heo
Hello, Michal.

On Wed, Jun 11, 2014 at 09:57:29AM +0200, Michal Hocko wrote:
> Is this the kind of symmetry Tejun is asking for and that would make
> change is Nack position? I am still not sure it satisfies his soft

Yes, pretty much.  What primarily bothered me was the soft/hard
guarantees being chosen by a toggle switch while the soft/hard limits
can be configured separately and combined.

> guarantee objections from other email.

I was wondering about the usefulness of "low" itself in isolation and
I still think it'd be less useful than "high", but as there seem to be
use cases which can be served with that and especially as a part of a
consistent control scheme, I have no objection.

"low" definitely requires a notification mechanism tho.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-11 Thread Michal Hocko
On Tue 10-06-14 12:57:56, Johannes Weiner wrote:
> On Mon, Jun 09, 2014 at 03:52:51PM -0700, Greg Thelen wrote:
> > 
> > On Fri, Jun 06 2014, Michal Hocko  wrote:
> > 
> > > Some users (e.g. Google) would like to have stronger semantic than low
> > > limit offers currently. The fallback mode is not desirable and they
> > > prefer hitting OOM killer rather than ignoring low limit for protected
> > > groups. There are other possible usecases which can benefit from hard
> > > guarantees. I can imagine workloads where setting low_limit to the same
> > > value as hard_limit to prevent from any reclaim at all makes a lot of
> > > sense because reclaim is much more disrupting than restart of the load.
> > >
> > > This patch adds a new per memcg memory.reclaim_strategy knob which
> > > tells what to do in a situation when memory reclaim cannot do any
> > > progress because all groups in the reclaimed hierarchy are within their
> > > low_limit. There are two options available:
> > >   - low_limit_best_effort - the current mode when reclaim falls
> > > back to the even reclaim of all groups in the reclaimed
> > > hierarchy
> > >   - low_limit_guarantee - groups within low_limit are never
> > > reclaimed and OOM killer is triggered instead. OOM message
> > > will mention the fact that the OOM was triggered due to
> > > low_limit reclaim protection.
> > 
> > To (a) be consistent with existing hard and soft limits APIs and (b)
> > allow use of both best effort and guarantee memory limits, I wonder if
> > it's best to offer three per memcg limits, rather than two limits (hard,
> > low_limit) and a related reclaim_strategy knob.  The three limits I'm
> > thinking about are:
> > 
> > 1) hard_limit (aka the existing limit_in_bytes cgroupfs file).  No
> >change needed here.  This is an upper bound on a memcg hierarchy's
> >memory consumption (assuming use_hierarchy=1).
> 
> This creates internal pressure.  Outside reclaim is not affected by
> it, but internal charges can not exceed this limit.  This is set to
> hard limit the maximum memory consumption of a group (max).
> 
> > 2) best_effort_limit (aka desired working set).  This allow an
> >application or administrator to provide a hint to the kernel about
> >desired working set size.  Before oom'ing the kernel is allowed to
> >reclaim below this limit.  I think the current soft_limit_in_bytes
> >claims to provide this.  If we prefer to deprecate
> >soft_limit_in_bytes, then a new desired_working_set_in_bytes (or a
> >hopefully better named) API seems reasonable.
> 
> This controls how external pressure applies to the group.
> 
> But it's conceivable that we'd like to have the equivalent of such a
> soft limit for *internal* pressure.  Set below the hard limit, this
> internal soft limit would have charges trigger direct reclaim in the
> memcg but allow them to continue to the hard limit.  This would create
> a situation wherein the allocating tasks are not killed, but throttled
> under reclaim, which gives the administrator a window to detect the
> situation with vmpressure and possibly intervene.  Because as it
> stands, once the current hard limit is hit things can go down pretty
> fast and the window for reacting to vmpressure readings is often too
> small.  This would offer a more gradual deterioration.  It would be
> set to the upper end of the working set size range (high).
> 
> I think for many users such an internal soft limit would actually be
> preferred over the current hard limit, as they'd rather have some
> reclaim throttling than an OOM kill when the group reaches its upper
> bound.  

Yes, this sounds useful. We have already discussed that and the
primary question is whether the high limit reclaim should be direct
or background. There are some cons and pros for both. Direct one is
much easier to implement but it is questionable whether it is too
heavy.  Background is much more tricky to implement on the other
hand. The obvious advantage would be a more convergence to the global
behavior while we still get the notification that something bad is
going on.  I assume that a dedicated workqueque would be doable but we
would definitely need an evaluation of what happens with zillions of
high_limit reclaimers.

> The current hard limit would be reserved for more advanced or paid
> cases, where the admin would rather see a memcg get OOM killed than
> exceed a certain size.

So the hard_limit will not change, right? Still reclaim and fallback to
OOM if nothing can be reclaimable as we do currently.

> Then, as you proposed, we'd have the soft limit for external pressure,
> where the kernel only reclaims groups within that limit in order to
> avoid OOM kills.  It would be set to the estimated lower end of the
> working set size range (low).

OK, that is how the current low_limit is implemented.

> > 3) low_limit_guarantee which is a lower bound of memory usage.  A memcg
> >would prefer to be oom killed rather t

Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-10 Thread Greg Thelen

On Tue, Jun 10 2014, Johannes Weiner  wrote:

> On Mon, Jun 09, 2014 at 03:52:51PM -0700, Greg Thelen wrote:
>> 
>> On Fri, Jun 06 2014, Michal Hocko  wrote:
>> 
>> > Some users (e.g. Google) would like to have stronger semantic than low
>> > limit offers currently. The fallback mode is not desirable and they
>> > prefer hitting OOM killer rather than ignoring low limit for protected
>> > groups. There are other possible usecases which can benefit from hard
>> > guarantees. I can imagine workloads where setting low_limit to the same
>> > value as hard_limit to prevent from any reclaim at all makes a lot of
>> > sense because reclaim is much more disrupting than restart of the load.
>> >
>> > This patch adds a new per memcg memory.reclaim_strategy knob which
>> > tells what to do in a situation when memory reclaim cannot do any
>> > progress because all groups in the reclaimed hierarchy are within their
>> > low_limit. There are two options available:
>> >- low_limit_best_effort - the current mode when reclaim falls
>> >  back to the even reclaim of all groups in the reclaimed
>> >  hierarchy
>> >- low_limit_guarantee - groups within low_limit are never
>> >  reclaimed and OOM killer is triggered instead. OOM message
>> >  will mention the fact that the OOM was triggered due to
>> >  low_limit reclaim protection.
>> 
>> To (a) be consistent with existing hard and soft limits APIs and (b)
>> allow use of both best effort and guarantee memory limits, I wonder if
>> it's best to offer three per memcg limits, rather than two limits (hard,
>> low_limit) and a related reclaim_strategy knob.  The three limits I'm
>> thinking about are:
>> 
>> 1) hard_limit (aka the existing limit_in_bytes cgroupfs file).  No
>>change needed here.  This is an upper bound on a memcg hierarchy's
>>memory consumption (assuming use_hierarchy=1).
>
> This creates internal pressure.  Outside reclaim is not affected by
> it, but internal charges can not exceed this limit.  This is set to
> hard limit the maximum memory consumption of a group (max).
>
>> 2) best_effort_limit (aka desired working set).  This allow an
>>application or administrator to provide a hint to the kernel about
>>desired working set size.  Before oom'ing the kernel is allowed to
>>reclaim below this limit.  I think the current soft_limit_in_bytes
>>claims to provide this.  If we prefer to deprecate
>>soft_limit_in_bytes, then a new desired_working_set_in_bytes (or a
>>hopefully better named) API seems reasonable.
>
> This controls how external pressure applies to the group.
>
> But it's conceivable that we'd like to have the equivalent of such a
> soft limit for *internal* pressure.  Set below the hard limit, this
> internal soft limit would have charges trigger direct reclaim in the
> memcg but allow them to continue to the hard limit.  This would create
> a situation wherein the allocating tasks are not killed, but throttled
> under reclaim, which gives the administrator a window to detect the
> situation with vmpressure and possibly intervene.  Because as it
> stands, once the current hard limit is hit things can go down pretty
> fast and the window for reacting to vmpressure readings is often too
> small.  This would offer a more gradual deterioration.  It would be
> set to the upper end of the working set size range (high).
>
> I think for many users such an internal soft limit would actually be
> preferred over the current hard limit, as they'd rather have some
> reclaim throttling than an OOM kill when the group reaches its upper
> bound.  The current hard limit would be reserved for more advanced or
> paid cases, where the admin would rather see a memcg get OOM killed
> than exceed a certain size.
>
> Then, as you proposed, we'd have the soft limit for external pressure,
> where the kernel only reclaims groups within that limit in order to
> avoid OOM kills.  It would be set to the estimated lower end of the
> working set size range (low).
>
>> 3) low_limit_guarantee which is a lower bound of memory usage.  A memcg
>>would prefer to be oom killed rather than operate below this
>>threshold.  Default value is zero to preserve compatibility with
>>existing apps.
>
> And this would be the external pressure hard limit, which would be set
> to the absolute minimum requirement of the group (min).
>
> Either because it would be hopelessly thrashing without it, or because
> this guaranteed memory is actually paid for.  Again, I would expect
> many users to not even set this minimum guarantee but solely use the
> external soft limit (low) instead.
>
>> Logically hard_limit >= best_effort_limit >= low_limit_guarantee.
>
> max >= high >= low >= min
>
> I think we should be able to express all desired usecases with these
> four limits, including the advanced configurations, while making it
> easy for many users to set up groups without being a) dead certain
> about their memory consumption or b) prep

Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-10 Thread Johannes Weiner
On Mon, Jun 09, 2014 at 03:52:51PM -0700, Greg Thelen wrote:
> 
> On Fri, Jun 06 2014, Michal Hocko  wrote:
> 
> > Some users (e.g. Google) would like to have stronger semantic than low
> > limit offers currently. The fallback mode is not desirable and they
> > prefer hitting OOM killer rather than ignoring low limit for protected
> > groups. There are other possible usecases which can benefit from hard
> > guarantees. I can imagine workloads where setting low_limit to the same
> > value as hard_limit to prevent from any reclaim at all makes a lot of
> > sense because reclaim is much more disrupting than restart of the load.
> >
> > This patch adds a new per memcg memory.reclaim_strategy knob which
> > tells what to do in a situation when memory reclaim cannot do any
> > progress because all groups in the reclaimed hierarchy are within their
> > low_limit. There are two options available:
> > - low_limit_best_effort - the current mode when reclaim falls
> >   back to the even reclaim of all groups in the reclaimed
> >   hierarchy
> > - low_limit_guarantee - groups within low_limit are never
> >   reclaimed and OOM killer is triggered instead. OOM message
> >   will mention the fact that the OOM was triggered due to
> >   low_limit reclaim protection.
> 
> To (a) be consistent with existing hard and soft limits APIs and (b)
> allow use of both best effort and guarantee memory limits, I wonder if
> it's best to offer three per memcg limits, rather than two limits (hard,
> low_limit) and a related reclaim_strategy knob.  The three limits I'm
> thinking about are:
> 
> 1) hard_limit (aka the existing limit_in_bytes cgroupfs file).  No
>change needed here.  This is an upper bound on a memcg hierarchy's
>memory consumption (assuming use_hierarchy=1).

This creates internal pressure.  Outside reclaim is not affected by
it, but internal charges can not exceed this limit.  This is set to
hard limit the maximum memory consumption of a group (max).

> 2) best_effort_limit (aka desired working set).  This allow an
>application or administrator to provide a hint to the kernel about
>desired working set size.  Before oom'ing the kernel is allowed to
>reclaim below this limit.  I think the current soft_limit_in_bytes
>claims to provide this.  If we prefer to deprecate
>soft_limit_in_bytes, then a new desired_working_set_in_bytes (or a
>hopefully better named) API seems reasonable.

This controls how external pressure applies to the group.

But it's conceivable that we'd like to have the equivalent of such a
soft limit for *internal* pressure.  Set below the hard limit, this
internal soft limit would have charges trigger direct reclaim in the
memcg but allow them to continue to the hard limit.  This would create
a situation wherein the allocating tasks are not killed, but throttled
under reclaim, which gives the administrator a window to detect the
situation with vmpressure and possibly intervene.  Because as it
stands, once the current hard limit is hit things can go down pretty
fast and the window for reacting to vmpressure readings is often too
small.  This would offer a more gradual deterioration.  It would be
set to the upper end of the working set size range (high).

I think for many users such an internal soft limit would actually be
preferred over the current hard limit, as they'd rather have some
reclaim throttling than an OOM kill when the group reaches its upper
bound.  The current hard limit would be reserved for more advanced or
paid cases, where the admin would rather see a memcg get OOM killed
than exceed a certain size.

Then, as you proposed, we'd have the soft limit for external pressure,
where the kernel only reclaims groups within that limit in order to
avoid OOM kills.  It would be set to the estimated lower end of the
working set size range (low).

> 3) low_limit_guarantee which is a lower bound of memory usage.  A memcg
>would prefer to be oom killed rather than operate below this
>threshold.  Default value is zero to preserve compatibility with
>existing apps.

And this would be the external pressure hard limit, which would be set
to the absolute minimum requirement of the group (min).

Either because it would be hopelessly thrashing without it, or because
this guaranteed memory is actually paid for.  Again, I would expect
many users to not even set this minimum guarantee but solely use the
external soft limit (low) instead.

> Logically hard_limit >= best_effort_limit >= low_limit_guarantee.

max >= high >= low >= min

I think we should be able to express all desired usecases with these
four limits, including the advanced configurations, while making it
easy for many users to set up groups without being a) dead certain
about their memory consumption or b) prepared for frequent OOM kills,
while still allowing them to properly utilize their machines.

What do you think?
--
To unsubscribe from this list: send the line "unsubscrib

Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-09 Thread Greg Thelen

On Fri, Jun 06 2014, Michal Hocko  wrote:

> Some users (e.g. Google) would like to have stronger semantic than low
> limit offers currently. The fallback mode is not desirable and they
> prefer hitting OOM killer rather than ignoring low limit for protected
> groups. There are other possible usecases which can benefit from hard
> guarantees. I can imagine workloads where setting low_limit to the same
> value as hard_limit to prevent from any reclaim at all makes a lot of
> sense because reclaim is much more disrupting than restart of the load.
>
> This patch adds a new per memcg memory.reclaim_strategy knob which
> tells what to do in a situation when memory reclaim cannot do any
> progress because all groups in the reclaimed hierarchy are within their
> low_limit. There are two options available:
>   - low_limit_best_effort - the current mode when reclaim falls
> back to the even reclaim of all groups in the reclaimed
> hierarchy
>   - low_limit_guarantee - groups within low_limit are never
> reclaimed and OOM killer is triggered instead. OOM message
> will mention the fact that the OOM was triggered due to
> low_limit reclaim protection.

To (a) be consistent with existing hard and soft limits APIs and (b)
allow use of both best effort and guarantee memory limits, I wonder if
it's best to offer three per memcg limits, rather than two limits (hard,
low_limit) and a related reclaim_strategy knob.  The three limits I'm
thinking about are:

1) hard_limit (aka the existing limit_in_bytes cgroupfs file).  No
   change needed here.  This is an upper bound on a memcg hierarchy's
   memory consumption (assuming use_hierarchy=1).

2) best_effort_limit (aka desired working set).  This allow an
   application or administrator to provide a hint to the kernel about
   desired working set size.  Before oom'ing the kernel is allowed to
   reclaim below this limit.  I think the current soft_limit_in_bytes
   claims to provide this.  If we prefer to deprecate
   soft_limit_in_bytes, then a new desired_working_set_in_bytes (or a
   hopefully better named) API seems reasonable.

3) low_limit_guarantee which is a lower bound of memory usage.  A memcg
   would prefer to be oom killed rather than operate below this
   threshold.  Default value is zero to preserve compatibility with
   existing apps.

Logically hard_limit >= best_effort_limit >= low_limit_guarantee.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-09 Thread Tejun Heo
Hello,

On Mon, Jun 09, 2014 at 10:30:42AM +0200, Michal Hocko wrote:
> On Fri 06-06-14 11:29:14, Tejun Heo wrote:
> > Why is this necessary?
> 
> It allows user/admin to set the default behavior.

By recomipling the kernel for something which can be trivially
configured post-boot without any difference?  The only thing it'll
achieve is confusing the hell out of people why different kernels show
different behaviors without any userland differences while taxing the
already constrained kernel configuration process more for no gain
whatsoever.

> How do you propose to tell the default then? Only at the runtime?
> I really do not insist on the kconfig. I find it useful for a)
> documentation purpose b) easy way to configure the default.

Please don't ever add Kconfig options like this.  This is uttrely
unnecessary and idiotic.  You don't add completely redundant Kconfig
option for documentation purposes.

> > * Are you sure soft and hard guarantees aren't useful when used in
> >   combination?  If so, why would that be the case?
> 
> This was a call from Google to have per-memcg setup AFAIR. Using
> different reclaim protection on the global case vs. limit reclaim makes
> a lot of sense to me. If this is a major obstacle then I am OK to drop
> it and only have a global setting for now.

Isn't it obvious that what needs to be investigated is why we're
trying to add an interface which is completely different for
guarantees as compared to limits?  Why wouldn't they have a symmetric
interface in the reverse direction as soft/hard limits?  If not, where
does the asymmetry come from?  Thse are the *first* questions which
should come to anyone's mind when [s]he is trying to add configs for a
different type of threshholds and something which must be explicitly
laid out as rationales for the design choices.

> > * We have pressure monitoring interface which can be used for soft
> >   limit pressure monitoring. 
> 
> Which one is that? I only know about oom_control triggered by the hard
> limit pressure.

Weren't you guys planning to use vmpressre notification to find out
about softlimit breach conditions?

> >   How should breaching soft guarantee be
> >   factored into that?  There doesn't seem to be any way of notifying
> >   that at the moment?  Wouldn't we want that to be integrated into the
> >   same mechanism?
> 
> Yes, there is. We have a counter in memory.stat file which tells how
> many times the limit has been breached.

How does the userland find out?  By polling the file every frigging
second?  Note that there actually is an actual asymmetry here which
makes breaching soft guarantee a much more significant event than
breaching soft limit - the former is violation of the configured
objective, the latter is not.  You *need* a way to notify the event.

> > What scares me the most is that you don't even seem to have noticed
> > the asymmetry and are proposing userland-facing interface without
> > actually thinking things through.  This is exactly how we've been
> > getting into trouble.
> 
> This has been discussed up and down for the last _two_ years. I have
> considered other options how to provide a very _useful_ feature users
> are calling for. There is even general consensus among developers that

AFAIR, there hasn't been much discussion about the details of the
interface and the proposed one is almost laughable.  How is this
acceptable as a userland visible API that we need to maintain for the
future?  It's broken on delivery.

> the feature is desirable and that the two modes (soft/hard) memory
> protection are needed. Yet I would _really_ like to hear any
> suggestion to get unstuck. It is far from useful to come and Nack this
> _again_ without providing any alternative suggestions.

I've pointed out two major points where the proposed interface is
evidently deficient and told you why they're so and it's not like the
said deficiencies are anything subtle.  If you can't figure out what
to do next from there on, I don't think I can help you.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-09 Thread Michal Hocko
On Fri 06-06-14 11:29:14, Tejun Heo wrote:
> Hello, Michal.
> 
> On Fri, Jun 06, 2014 at 04:46:50PM +0200, Michal Hocko wrote:
> > +choice
> > +   prompt "Memory Resource Controller reclaim protection"
> > +   depends on MEMCG
> > +   help
> 
> Why is this necessary?

It allows user/admin to set the default behavior.

> - This doesn't affect boot.
> 
> - memcg requires runtime config *anyway*.
> 
> - The config is inherited from the parent, so the default flipping
>   isn't exactly difficult.
> 
> Please drop the kconfig option.

How do you propose to tell the default then? Only at the runtime?
I really do not insist on the kconfig. I find it useful for a)
documentation purpose b) easy way to configure the default.

> > +static int mem_cgroup_write_reclaim_strategy(struct cgroup_subsys_state 
> > *css, struct cftype *cft,
> > +   char *buffer)
> > +{
> > +   struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> > +   int ret = 0;
> > +
> > +   if (!strncmp(buffer, "low_limit_guarantee",
> > +   sizeof("low_limit_guarantee"))) {
> > +   memcg->hard_low_limit = true;
> > +   } else if (!strncmp(buffer, "low_limit_best_effort",
> > +   sizeof("low_limit_best_effort"))) {
> > +   memcg->hard_low_limit = false;
> > +   } else
> > +   ret = -EINVAL;
> > +
> > +   return ret;
> > +}
> 
> So, ummm, this raises a big red flag for me.  You're now implementing
> two behaviors in a mostly symmetric manner to soft/hard limits but
> choosing a completely different scheme in how they're configured
> without any rationale.

So what is your suggestion then? Using a global setting? Using a
separate knob? Something completely different?

> * Are you sure soft and hard guarantees aren't useful when used in
>   combination?  If so, why would that be the case?

This was a call from Google to have per-memcg setup AFAIR. Using
different reclaim protection on the global case vs. limit reclaim makes
a lot of sense to me. If this is a major obstacle then I am OK to drop
it and only have a global setting for now.

> * We have pressure monitoring interface which can be used for soft
>   limit pressure monitoring. 

Which one is that? I only know about oom_control triggered by the hard
limit pressure.

>   How should breaching soft guarantee be
>   factored into that?  There doesn't seem to be any way of notifying
>   that at the moment?  Wouldn't we want that to be integrated into the
>   same mechanism?

Yes, there is. We have a counter in memory.stat file which tells how
many times the limit has been breached.

> What scares me the most is that you don't even seem to have noticed
> the asymmetry and are proposing userland-facing interface without
> actually thinking things through.  This is exactly how we've been
> getting into trouble.

This has been discussed up and down for the last _two_ years. I have
considered other options how to provide a very _useful_ feature users
are calling for. There is even general consensus among developers that
the feature is desirable and that the two modes (soft/hard) memory
protection are needed. Yet I would _really_ like to hear any
suggestion to get unstuck. It is far from useful to come and Nack this
_again_ without providing any alternative suggestions.

> For now, for everything.
> 
>  Nacked-by: Tejun Heo 
> 
> Thanks.
> 
> -- 
> tejun

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-06 Thread Tejun Heo
A bit of addition.

Let's *please* think through how memcg should be configured and
different knobs / limits interact with each other and come up with a
consistent scheme before adding more shits on top.  This "oh I know
this use case and maybe that behavior is necessary too, let's add N
different and incompatible ways to mix and match them" doesn't fly.
Aren't we suppposed to at least have learned that already?

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memcg: Allow hard guarantee mode for low limit reclaim

2014-06-06 Thread Tejun Heo
Hello, Michal.

On Fri, Jun 06, 2014 at 04:46:50PM +0200, Michal Hocko wrote:
> +choice
> + prompt "Memory Resource Controller reclaim protection"
> + depends on MEMCG
> + help

Why is this necessary?

- This doesn't affect boot.

- memcg requires runtime config *anyway*.

- The config is inherited from the parent, so the default flipping
  isn't exactly difficult.

Please drop the kconfig option.

> +static int mem_cgroup_write_reclaim_strategy(struct cgroup_subsys_state 
> *css, struct cftype *cft,
> + char *buffer)
> +{
> + struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> + int ret = 0;
> +
> + if (!strncmp(buffer, "low_limit_guarantee",
> + sizeof("low_limit_guarantee"))) {
> + memcg->hard_low_limit = true;
> + } else if (!strncmp(buffer, "low_limit_best_effort",
> + sizeof("low_limit_best_effort"))) {
> + memcg->hard_low_limit = false;
> + } else
> + ret = -EINVAL;
> +
> + return ret;
> +}

So, ummm, this raises a big red flag for me.  You're now implementing
two behaviors in a mostly symmetric manner to soft/hard limits but
choosing a completely different scheme in how they're configured
without any rationale.

* Are you sure soft and hard guarantees aren't useful when used in
  combination?  If so, why would that be the case?

* We have pressure monitoring interface which can be used for soft
  limit pressure monitoring.  How should breaching soft guarantee be
  factored into that?  There doesn't seem to be any way of notifying
  that at the moment?  Wouldn't we want that to be integrated into the
  same mechanism?

What scares me the most is that you don't even seem to have noticed
the asymmetry and are proposing userland-facing interface without
actually thinking things through.  This is exactly how we've been
getting into trouble.

For now, for everything.

 Nacked-by: Tejun Heo 

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/