Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-06 Thread Thomas Gleixner
On Tue, 5 May 2015, Tejun Heo wrote:
> On Tue, May 05, 2015 at 08:29:28PM +0200, Thomas Gleixner wrote:
> > As Peter said several times: hard failure is good and desired. It's a
> > very clear information on which people can act on. If the failures
> > modes are nilly-willy today, as you wrote somewhere, then we need to
> > fix that and make them consistent and understandable and not replace
> > them by half baken heuristics which postpone the failure to some point
> > where it is even less understandable.
> 
> There are no such magic heuristics because controllers need well
> defined behaviors when current is above limit anyway and behave
> exactly the same way no matter how that state is reached.  For

How would something go above limit in the first place if your resource
management is done proper?

  If a group has a resource limit, then it is not allowed to exceed
  that resource. So any attempt to use more resources must fail,
  period. There is no way to go above the limit.

  If you try to lower the limits of an existing group below the level
  which is already used, then this limit restriction attempt must
  fail.

That's the basic principle of resource management. And if you try to
avoid them, then you have a massive design failure. It's that simple.

Thanks,

tglx




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-06 Thread Peter Zijlstra
On Tue, May 05, 2015 at 03:06:03PM -0400, Tejun Heo wrote:
> Hello, Peter.
> 
> On Tue, May 05, 2015 at 09:00:57PM +0200, Peter Zijlstra wrote:
> > On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote:
> > > What I don't want to happen is controllers failing migrations
> > > willy-nilly for random reasons leaving users baffled, which we've
> > > actually been doing unfortunately.  Maybe we need to deal with this
> > > fixed resource arbitration as a separate class and allow them to fail
> > > migration w/ -EBUSY.
> > 
> > Ah, _that_ was the problem.
> > 
> > Which is something created by this co-mounting of controllers.
> 
> Yeah, partly, but also that it's an extra failure mode which isn't
> necessary for most controllers.

I can agree with reducing failure modes, but we should not do it at the
cost of functionality.

> > You could of course store the ss-id of the failing operation in
> > task_struct and have a file reporting the name of the ss-id.
> > 
> > That way, there is a simple way to find out which controller failed the
> > migrate.
> 
> Given that the resources which can fail are very limited, I don't
> think we need that right now as long as we limit and document the
> possible failure cases clearly.  Hopefully, this won't devolve into
> collection of arbitrary failures.

Right, but something like that would be fairly trivial to implement and
would give immediate resolution.

For example:

$ echo 123 > /cgroups/monkey/business/tasks
-EBUSY
$ cat /cgroups/monkey/business/errno
cpu:-EBUSY

(in fact, for a trivial implementation it doesn't matter which
cgroup/errno you cat)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-06 Thread Peter Zijlstra
On Tue, May 05, 2015 at 03:06:03PM -0400, Tejun Heo wrote:
 Hello, Peter.
 
 On Tue, May 05, 2015 at 09:00:57PM +0200, Peter Zijlstra wrote:
  On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote:
   What I don't want to happen is controllers failing migrations
   willy-nilly for random reasons leaving users baffled, which we've
   actually been doing unfortunately.  Maybe we need to deal with this
   fixed resource arbitration as a separate class and allow them to fail
   migration w/ -EBUSY.
  
  Ah, _that_ was the problem.
  
  Which is something created by this co-mounting of controllers.
 
 Yeah, partly, but also that it's an extra failure mode which isn't
 necessary for most controllers.

I can agree with reducing failure modes, but we should not do it at the
cost of functionality.

  You could of course store the ss-id of the failing operation in
  task_struct and have a file reporting the name of the ss-id.
  
  That way, there is a simple way to find out which controller failed the
  migrate.
 
 Given that the resources which can fail are very limited, I don't
 think we need that right now as long as we limit and document the
 possible failure cases clearly.  Hopefully, this won't devolve into
 collection of arbitrary failures.

Right, but something like that would be fairly trivial to implement and
would give immediate resolution.

For example:

$ echo 123  /cgroups/monkey/business/tasks
-EBUSY
$ cat /cgroups/monkey/business/errno
cpu:-EBUSY

(in fact, for a trivial implementation it doesn't matter which
cgroup/errno you cat)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-06 Thread Thomas Gleixner
On Tue, 5 May 2015, Tejun Heo wrote:
 On Tue, May 05, 2015 at 08:29:28PM +0200, Thomas Gleixner wrote:
  As Peter said several times: hard failure is good and desired. It's a
  very clear information on which people can act on. If the failures
  modes are nilly-willy today, as you wrote somewhere, then we need to
  fix that and make them consistent and understandable and not replace
  them by half baken heuristics which postpone the failure to some point
  where it is even less understandable.
 
 There are no such magic heuristics because controllers need well
 defined behaviors when current is above limit anyway and behave
 exactly the same way no matter how that state is reached.  For

How would something go above limit in the first place if your resource
management is done proper?

  If a group has a resource limit, then it is not allowed to exceed
  that resource. So any attempt to use more resources must fail,
  period. There is no way to go above the limit.

  If you try to lower the limits of an existing group below the level
  which is already used, then this limit restriction attempt must
  fail.

That's the basic principle of resource management. And if you try to
avoid them, then you have a massive design failure. It's that simple.

Thanks,

tglx




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Tue, May 05, 2015 at 09:00:57PM +0200, Peter Zijlstra wrote:
> On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote:
> > What I don't want to happen is controllers failing migrations
> > willy-nilly for random reasons leaving users baffled, which we've
> > actually been doing unfortunately.  Maybe we need to deal with this
> > fixed resource arbitration as a separate class and allow them to fail
> > migration w/ -EBUSY.
> 
> Ah, _that_ was the problem.
> 
> Which is something created by this co-mounting of controllers.

Yeah, partly, but also that it's an extra failure mode which isn't
necessary for most controllers.

> You could of course store the ss-id of the failing operation in
> task_struct and have a file reporting the name of the ss-id.
> 
> That way, there is a simple way to find out which controller failed the
> migrate.

Given that the resources which can fail are very limited, I don't
think we need that right now as long as we limit and document the
possible failure cases clearly.  Hopefully, this won't devolve into
collection of arbitrary failures.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote:
> 
> What I don't want to happen is controllers failing migrations
> willy-nilly for random reasons leaving users baffled, which we've
> actually been doing unfortunately.  Maybe we need to deal with this
> fixed resource arbitration as a separate class and allow them to fail
> migration w/ -EBUSY.

Ah, _that_ was the problem.

Which is something created by this co-mounting of controllers.

You could of course store the ss-id of the failing operation in
task_struct and have a file reporting the name of the ss-id.

That way, there is a simple way to find out which controller failed the
migrate.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Thomas.

On Tue, May 05, 2015 at 08:29:28PM +0200, Thomas Gleixner wrote:
> I fully agree and after reading through this thread I really have to
> say that this whole notion of relax the admission control and then try
> to magically converge to the resource limits is horrible in all
> aspects.

This comes down to controllers allowing limits to be configured
current usage.  We need to allow and define what happens in that
situation and moving a process into a full cgroup inherently follows
the same pattern albeit from the other direction.

> The idea of allowing overcommitment and magically converging to back
> to the limits yells heuristics all over the place and we all know how
> reliable heuristics are.

It's not magic heuristics.  This is a core part of normal operation.

> As Peter said several times: hard failure is good and desired. It's a
> very clear information on which people can act on. If the failures
> modes are nilly-willy today, as you wrote somewhere, then we need to
> fix that and make them consistent and understandable and not replace
> them by half baken heuristics which postpone the failure to some point
> where it is even less understandable.

There are no such magic heuristics because controllers need well
defined behaviors when current is above limit anyway and behave
exactly the same way no matter how that state is reached.  For
resources like RR slices, this doesn't work and that's why this is an
issue, so yeah this is the process of finding out what must be able to
fail.

> If there are issues with run-away problems, i.e. upping a resource
> limit which gets eaten up from the existing tasks before you can admit
> a new one, then your magic convergence thing is again the wrong
> answer. The right approach is:
> 
>   1) Up the limit and make a reservation at the same time
>   2) Admit the new task and allow it to consume the reservation
>   3) Set it effective

I don't really think this is a scenario we need to worry about.  If we
choose to fail migration, let's just fail it.  There's no point in
building a mechanism to work around malbehavior from its users.

> > Are you really going to force us to abandon cgroups and invent yet
> > another grouping thing?
> 
> Sigh no. I think cgroups can be fixed, if we just adhere to the basic
> principles of hierarchical resource management and remove/reject all
> magic "we'll fix that for you" nonsense.

So, let's do -EBUSY for hard resource failures which have to be exact.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, again.

On Tue, May 05, 2015 at 06:50:06PM +0200, Peter Zijlstra wrote:
> I really don't get what you're saying there. If its not allowed to
> 'escape' there must be some equivalent of can_attach().
> 
> Otherwise you simply cannot reject the move.

A given user isn't allowed to move processes into a cgroup outside its
subhierarchy and the hierarchical resource control keeps the
subhierarchy under the limits no matter what the user does inside it.
Whether can_attach can fail or not is peripheral in this sense - if a
user can move processes into a cgroup outside its allowed scope, the
user can already escape regardless of the specifics of configuration.

Any user of cgroups should be confined to its scope and when it's
confined that way, the hierarchical limits are enforced no matter what
happens in its subhierarchy.

> > Furthermore, in majority of use cases, organizational operations are
> > used to set up the hierarchy when starting up a group and then left
> > alone.  For stateful controller like memcg process migrations are
> > inherently expensive and intrusive, so the usage model isn't
> > arbitrary.  This is a corner case issue and doesn't really affect the
> > whole model.
> 
> Again, I don't follow, so why is can_attach() bad?

It's more like can_attach failures don't add much for other
controllers.  Please see below.

> People should not unknowingly let programs use RR/FIFO. Also what sorts
> of 'problems' are people having because of this? What kind of
> applications 'require' RR/FIFO on a normal desktop?

The cases I hear about are mostly audio applications which end up in
whatever default cgroups other applications are put in w/o an easy way
to configure the hierarchy for RR slices.  As I wrote way back, if
these can't be decoupled, whoever is setting up cpu cgroup hierarchies
will also have to take part in distributing realtime slices.

This might not necessarily be a bad thing.  It's just different from
everything else cgroups deal with at this point.

> > I don't get this part.  How does making organization supercede
> > configuration destroy hierarchy?
> 
> If you want to unconditionally allow task migration between groups, the
> hierarchy doesn't actually mean anything.
>
> You can't enforce hierarchical constraints. Which to me is the entire
> point of having a hierarchy.

No, hierarchy still puts restrictions on who can do what where.
Whether organization operations supercede configurations or not
doens't affect this at all.  Again, if you can stow away processes out
of your domain, you're escaping the hierarchical constrasints all the
same.  Delegations need to scoped no matter what.

> > This can't be ratio-distributed or
> > soft-capped and having to tie this together with regular cpu
> > controller is annoying.
> 
> Welcome to actual world issues. Stop pretending this stuff is easy and
> can be hidden from the user.
> 
> IF people want to use RR/FIFO they had better damn well know what
> they're doing. There is not way around that. There's just too many
> things that can go wrong with it.
> 
> If they don't want to deal with this problems, then tell them to go
> away. Do _NOT_ pretend its easy and fudge it for them.
> 
> This on-demand carving thing you mention, that's a _MASSIVE_ fudge. Just
> don't even go there.

How is on-demand allocation fudging?  You can do it manually or you
can have policies set up to allocate the specific resource.  This is
really beside the point tho.  What I was trying to say was that this
takes a different approach from other non-hard resources.

> > Well, let's agree to disagree on that one.  It's not about allowing
> > willy nilly everything but separating out the specification of intent
> > from the current state and you also saw how coupling the two tightly
> > messed up cpuset.  It can make configuration tedious enough to the
> > point where it becomes impractical to use under certain circumstances.
> 
> Well, no I didn't see how cpusets was messed up. You see that is where
> we start to disagree.

Yeah, seems that way.  Let's agree to disagree here.

> The improvement I wanted to cpusets was to simply disallow hotplug when
> there were tasks that could not go elsewhere.

Would that mean we're also gonna disallow hotunplug if some threads
are pinned to that cpu?  And the kernel would still be changing
configurations in an non-reversible way.  Again, how does that jive
with plain affinities?

> That said, this is not the point we're now arguing about; I want the
> hierarchy to actually mean something, and the only way to do that is to
> allow can_attach().
> 
> Without can_attach() one cannot provide hierarchical constraints.

I don't think this is the point either.  The point is how to deal with
hard resources that can't be permissive by default.

> > > Also, who's the one doing a PID controller which will hard fail fork?
> > > How are you going to do away with can_attach() there? Surely you need to
> > > dis-allow another task joining when 

Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Thomas Gleixner
On Tue, 5 May 2015, Peter Zijlstra wrote:
> On Tue, May 05, 2015 at 12:13:35PM -0400, Tejun Heo wrote:
> > On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote:
> > > 
> > > So no; hard failure is good and desired. It allows guarantees, which is
> > > a good and desired feature of control.
> > 
> > Isn't that too sweeping a statement?  We want them in some places but
> > not necessarily in all places.  The hard failures aren't going away.
> > They're just localized to specific areas where they're easier to
> > handle.
> 
> Easier how? I'm really not seeing how any of this is making things
> easier for anybody.
> 
> All I'm seeing is that you're making cgroups useless for people who want
> to guarantee things (eg. the realtime people).

I fully agree and after reading through this thread I really have to
say that this whole notion of relax the admission control and then try
to magically converge to the resource limits is horrible in all
aspects.

Hierarchies must have a strictly inherited and overall consistent
resource management and therefor resource limitation. Otherwise they
are just useless.

The idea of allowing overcommitment and magically converging to back
to the limits yells heuristics all over the place and we all know how
reliable heuristics are.

Tejun, you try to make the whole configuration and placement simpler
for the user, but all you achieve is that you act like all these
politicians who promise tax cuts and whatever and forget about them
once the elections are over. How is that going to make stuff simpler
for users/admins? Not at all.

Instead of failing hard at placement/configuration time they get
surprised by hard to understand fallout of magic convergence
heuristics. That's crap and no matter how you argue it stays crap.

As Peter said several times: hard failure is good and desired. It's a
very clear information on which people can act on. If the failures
modes are nilly-willy today, as you wrote somewhere, then we need to
fix that and make them consistent and understandable and not replace
them by half baken heuristics which postpone the failure to some point
where it is even less understandable.

If there are issues with run-away problems, i.e. upping a resource
limit which gets eaten up from the existing tasks before you can admit
a new one, then your magic convergence thing is again the wrong
answer. The right approach is:

  1) Up the limit and make a reservation at the same time
  2) Admit the new task and allow it to consume the reservation
  3) Set it effective

You can apply this to ALL sorts of resource controllers and you give
the user a very simple to understand mechanism to control and
configure his system.

> Are you really going to force us to abandon cgroups and invent yet
> another grouping thing?

Sigh no. I think cgroups can be fixed, if we just adhere to the basic
principles of hierarchical resource management and remove/reject all
magic "we'll fix that for you" nonsense.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Tue, May 05, 2015 at 04:10:49PM +0200, Peter Zijlstra wrote:
> Imagine:
> 
> root
>/\
>   A  B
>/ \/ \
>   a1 a2  b1 b2
> 
> Now if they all have -1, I cannot set a bw on any except the leaf nodes
> ([ab][12]). Because the sum of child bw must strictly be smaller or
> equal to the parent bandwidth, and -1 if effective inf.
> 
> Similarly, if A has bw enabled I cannot create a new child with -1.
> Because above.
> 
> Now you can kludge around some of this, for example you can make the
> default depend on the parent setting etc.. But that's horribly
> inconsistent.

I don't think we can kludge this.  For all other resources, we're
defining the limits that can't be crossed so nesting them w/ -1 by
default is fine.  RR slices are different it that we're really slicing
up and guaranteeing a portion of something finite, so unlimited by
default thing doesn't really work here.

> So I really prefer not to go that way; if people use RR/FIFO they had
> better bloody know what they're doing; which includes setting up the
> system.

The problem is that this is tied to the normal cpu controller.  Users
who don't have any intention of mucking with RT scheduling end up
being dragged into it.  Given the strict nature of RR slicing, I'm
don't even think it's actually useful to make the slicing
hierarchical.  From cgroup's POV, it'd be best if RR slicing can be
detached.

> The whole RR/FIFO thing is so enormously broken (by definition; this
> truly is unfixable) that you simply _cannot_ automate it.

Yeah, exactly.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote:
> > I just realized we allow removing/adding controllers from/to cgroups
> > while there are tasks in them, which isn't safe unless we eliminate all
> > can_attach callbacks. We've done so for some cgroup subsystems, but
> > there are still a few of them...
> 
> You can't remove can_attach(), we must be able to disallow joining a
> cgroup.
> 
> If that results in you not being able to change the cgroup setup with
> tasks in, so be it -- that seems like a sane restriction anyhow.

This is really an interface policy issue.  For all other controllers,
it's almost trivial to let organizational operations (setting up
hierarchies, moving processes around) overrule controller
configurations.  The main benefit of doing this is that this decouples
organizational operations from resource control.  Users can depend on
the fact that allowed organizational operations won't fail due to
specific controller configuration issues.

This also works well with controllers accepting target configurations
regardless of the current state and enforcing rules to converge to the
configured state instead.  e.g. if you set max memory lower than the
currently used, the config will be accepted and the controller will
keep trying to make the current state converge to the target state.
This is important as rejecting configuration can lead to chasing game
between configuration attempts and run-away resource consumption.

Now, RR slices are the special case here because it's inherently
different from every other resource cgroup is concerned with.  It
simply doesn't fit into the same model that other resources follow.
There are several options we can try.

1. Decouple RR slices from cpu controller.  This would be the best
   route to follow.  RR slices need a hard allocator no matter what we
   do.  There isn't much point in imposing hierarchical structure on
   top of it.

2. Implement special case behavior so that it can follow the same
   model.  e.g. resetting RR scheduling config when the effective cpu
   cgroup changes or carrying the amount of slice being consumed with
   the process being moved.  No matter how this is done, it's gonna be
   a clear compromise as we're forcing this into the model which
   doesn't quite fit it.  That said, given how RR slices are a special
   case to begin with, I think this can be acceptable.

3. Take compromise in the other direction - add exceptions to
   organizational operations but clearly limit the failure modes.  We
   prolly want to structure code in a way to enforce this.

4. If #1 can be done in time but not right now, simply disallow any
   RR/FIFO in !root cgroups on the unified hierarchy for now.

What do you think?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 10:41:04AM -0400, Tejun Heo wrote:
> Hello, Peter.
> 
> On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote:
> > > I just realized we allow removing/adding controllers from/to cgroups
> > > while there are tasks in them, which isn't safe unless we eliminate all
> > > can_attach callbacks. We've done so for some cgroup subsystems, but
> > > there are still a few of them...
> > 
> > You can't remove can_attach(), we must be able to disallow joining a
> > cgroup.
> > 
> > If that results in you not being able to change the cgroup setup with
> > tasks in, so be it -- that seems like a sane restriction anyhow.
> 
> This is really an interface policy issue.  For all other controllers,
> it's almost trivial to let organizational operations (setting up
> hierarchies, moving processes around) overrule controller
> configurations.  The main benefit of doing this is that this decouples
> organizational operations from resource control.  Users can depend on
> the fact that allowed organizational operations won't fail due to
> specific controller configuration issues.

But but but... that doesn't make any damn sense! Why would you want to
do something mad like that?

To me the organization is very much part of the control structure. It
cannot be an invariant. Treating it like that destroys the whole notion
of a hierarchy.

> This also works well with controllers accepting target configurations
> regardless of the current state and enforcing rules to converge to the
> configured state instead.

I think we had a long discussion on that which we never finished. I'm
not much for converging to a state. Either it can or it can not and you
hard fail.

With this soft lets just accept any old crap mentality you cannot
provide guarantees.

> e.g. if you set max memory lower than the
> currently used, the config will be accepted and the controller will
> keep trying to make the current state converge to the target state.
> This is important as rejecting configuration can lead to chasing game
> between configuration attempts and run-away resource consumption.

This is an entirely different issue; albeit with its own pitfalls, what
if you put the max too low and you run into a never ending reclaim loop?
Attempting to attain the unattainable.

> Now, RR slices are the special case here because it's inherently
> different from every other resource cgroup is concerned with. 

I don't think so, any controller which wants to carve up a fixed
resource in non proportional ways is going to run into this.

Its just that you don't want this, but that doesn't render it less
useful.

> It
> simply doesn't fit into the same model that other resources follow.
> There are several options we can try.
> 
> 1. Decouple RR slices from cpu controller.  This would be the best
>route to follow.  RR slices need a hard allocator no matter what we
>do.  There isn't much point in imposing hierarchical structure on
>top of it.

The same is true of SCHED_DEADLINE, we hard divide a fixed amount. We've
not currently exposed it to cgroups, but we want to eventually.

As to not having a hierarchy; you're the one destroying it by saying the
organization should be decoupled from the controller.

And, no a hierarchy still makes perfect sense, think of containers, they
might not even see the parent.

> 3. Take compromise in the other direction - add exceptions to
>organizational operations but clearly limit the failure modes.  We
>prolly want to structure code in a way to enforce this.

I'm for failure modes as you should well now by know ;-)

I really think you're moving in the wrong direction with the whole
cgroup stuff if you just want to willy nilly allow everything.

Also, who's the one doing a PID controller which will hard fail fork?
How are you going to do away with can_attach() there? Surely you need to
dis-allow another task joining when its at its maximum number of allowed
PIDs, the same condition you're going to fail fork().

So no; hard failure is good and desired. It allows guarantees, which is
a good and desired feature of control.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 10:18:38AM -0400, Tejun Heo wrote:
> > Now you can kludge around some of this, for example you can make the
> > default depend on the parent setting etc.. But that's horribly
> > inconsistent.
> 
> I don't think we can kludge this.  For all other resources, we're
> defining the limits that can't be crossed so nesting them w/ -1 by
> default is fine.  RR slices are different it that we're really slicing
> up and guaranteeing a portion of something finite, so unlimited by
> default thing doesn't really work here.

Note that you _could_ do the same thing with IO bandwidth; esp. with
these modern no-seek-penalty devices this could make sense.


> > So I really prefer not to go that way; if people use RR/FIFO they had
> > better bloody know what they're doing; which includes setting up the
> > system.
> 
> The problem is that this is tied to the normal cpu controller.  Users
> who don't have any intention of mucking with RT scheduling end up
> being dragged into it.  Given the strict nature of RR slicing, I'm
> don't even think it's actually useful to make the slicing
> hierarchical.  From cgroup's POV, it'd be best if RR slicing can be
> detached.

Like in the other mail; hierarchy still makes perfect sense for the
container case.

> > The whole RR/FIFO thing is so enormously broken (by definition; this
> > truly is unfixable) that you simply _cannot_ automate it.
> 
> Yeah, exactly.

I don't think you're quite agreeing to the same reasons I am. My main
objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that
it does not in fact allow the OS to do what an OS _should_ do, namely
resource arbitration and control.

The whole rt-cgroup controller tries to somewhat contain that, but
fundamentally once you use RR/FIFO you've given up your system to
userspace control -- which btw is why its usually limited to root.

SCHED_DEADLINE avoids all these problems, at the cost of a more complex
setup.

But the fact that both need fixed portions of a limited total does not
in fact mean they're broken.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 12:13:35PM -0400, Tejun Heo wrote:
> Hello, Peter.
> 
> On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote:
> ...
> > But but but... that doesn't make any damn sense! Why would you want to
> > do something mad like that?
> > 
> > To me the organization is very much part of the control structure. It
> > cannot be an invariant. Treating it like that destroys the whole notion
> > of a hierarchy.
> 
> You and I don't really agree on this.  The disagreement is fine but
> what I don't get is why this is such a big deal.  How would it break
> the whole notion of a hierarchy?  A user isn't allowed to esacpe the
> subhierarchy it's allowed in no matter what.  Whether organizational
> operations supercedes configurations or not doesn't matter as long as
> the user is confined under the right hierarchy.

I really don't get what you're saying there. If its not allowed to
'escape' there must be some equivalent of can_attach().

Otherwise you simply cannot reject the move.

> Furthermore, in majority of use cases, organizational operations are
> used to set up the hierarchy when starting up a group and then left
> alone.  For stateful controller like memcg process migrations are
> inherently expensive and intrusive, so the usage model isn't
> arbitrary.  This is a corner case issue and doesn't really affect the
> whole model.

Again, I don't follow, so why is can_attach() bad?

> > I don't think so, any controller which wants to carve up a fixed
> > resource in non proportional ways is going to run into this.
> > 
> > Its just that you don't want this, but that doesn't render it less
> > useful.
> 
> Well, of the resources that we handle right now, it is a special case
> and a sucky one at that because it ties itself to regular cpu
> controller which doesn't need that behavior.

It doesn't 'tie' itself to the cpu controller, its a fundamental part of
the cpu controller. The cpu controller is about all computation time,
RR/FIFO is a very much part of that.

And RR/FIFO is extra special in that if you grant a process that it can
suck your machine dry of this time. This is why you must configure it.

People should not unknowingly let programs use RR/FIFO. Also what sorts
of 'problems' are people having because of this? What kind of
applications 'require' RR/FIFO on a normal desktop?

> > As to not having a hierarchy; you're the one destroying it by saying the
> > organization should be decoupled from the controller.
> 
> I don't get this part.  How does making organization supercede
> configuration destroy hierarchy?

If you want to unconditionally allow task migration between groups, the
hierarchy doesn't actually mean anything.

You can't enforce hierarchical constraints. Which to me is the entire
point of having a hierarchy.

> > And, no a hierarchy still makes perfect sense, think of containers, they
> > might not even see the parent.
> 
> The mode of configuration is different tho.  No matter what we do, if
> we want to automate this sort of distribution with resource as limited
> as realtime slices, it'll need a separate allocator which can carve
> out resources on demand.

But you don't want to automate, full stop.

> This can't be ratio-distributed or
> soft-capped and having to tie this together with regular cpu
> controller is annoying.

Welcome to actual world issues. Stop pretending this stuff is easy and
can be hidden from the user.

IF people want to use RR/FIFO they had better damn well know what
they're doing. There is not way around that. There's just too many
things that can go wrong with it.

If they don't want to deal with this problems, then tell them to go
away. Do _NOT_ pretend its easy and fudge it for them.

This on-demand carving thing you mention, that's a _MASSIVE_ fudge. Just
don't even go there.

> > I really think you're moving in the wrong direction with the whole
> > cgroup stuff if you just want to willy nilly allow everything.
> 
> Well, let's agree to disagree on that one.  It's not about allowing
> willy nilly everything but separating out the specification of intent
> from the current state and you also saw how coupling the two tightly
> messed up cpuset.  It can make configuration tedious enough to the
> point where it becomes impractical to use under certain circumstances.

Well, no I didn't see how cpusets was messed up. You see that is where
we start to disagree.

The improvement I wanted to cpusets was to simply disallow hotplug when
there were tasks that could not go elsewhere.

> The thing is, allowing to specify configurations doesn't prevent the
> user from enforcing stricter rules.  The current state is always
> visible to the user and if it fails to converge, the user can take
> whatever actions that it needs to take to remedy the situation.

Right, so how about failing hotplug if there's (user) tasks pinned to a
cpu? That's clearly visible and the user can go fix it if he really
wants to do the unplug.

That's a very similar thing, but 

Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Tue, May 05, 2015 at 05:19:49PM +0200, Peter Zijlstra wrote:
> > I don't think we can kludge this.  For all other resources, we're
> > defining the limits that can't be crossed so nesting them w/ -1 by
> > default is fine.  RR slices are different it that we're really slicing
> > up and guaranteeing a portion of something finite, so unlimited by
> > default thing doesn't really work here.
> 
> Note that you _could_ do the same thing with IO bandwidth; esp. with
> these modern no-seek-penalty devices this could make sense.

Yeah, maybe.  It currently is too unpredictable to do that (at least
from OS side w/ all the layering) but that is a possibility.

> > The problem is that this is tied to the normal cpu controller.  Users
> > who don't have any intention of mucking with RT scheduling end up
> > being dragged into it.  Given the strict nature of RR slicing, I'm
> > don't even think it's actually useful to make the slicing
> > hierarchical.  From cgroup's POV, it'd be best if RR slicing can be
> > detached.
> 
> Like in the other mail; hierarchy still makes perfect sense for the
> container case.

We'd still need an on-demand arbitration mechanism across containers
no matter what we do which might as well take care of everything.  But
please see below.

> > > The whole RR/FIFO thing is so enormously broken (by definition; this
> > > truly is unfixable) that you simply _cannot_ automate it.
> > 
> > Yeah, exactly.
> 
> I don't think you're quite agreeing to the same reasons I am. My main
> objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that
> it does not in fact allow the OS to do what an OS _should_ do, namely
> resource arbitration and control.
> 
> The whole rt-cgroup controller tries to somewhat contain that, but
> fundamentally once you use RR/FIFO you've given up your system to
> userspace control -- which btw is why its usually limited to root.
> 
> SCHED_DEADLINE avoids all these problems, at the cost of a more complex
> setup.
> 
> But the fact that both need fixed portions of a limited total does not
> in fact mean they're broken.

But that does make them pretty different from others.  What bothers me
the most about RR slices right now is that it's tightly coupled with
the rest of cpu controller while having a very different set of
characteristics.  Maybe this is something mandated by the underlying
structure and we have to live with it but it definitely isn't an ideal
situation.

What I don't want to happen is controllers failing migrations
willy-nilly for random reasons leaving users baffled, which we've
actually been doing unfortunately.  Maybe we need to deal with this
fixed resource arbitration as a separate class and allow them to fail
migration w/ -EBUSY.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote:
...
> But but but... that doesn't make any damn sense! Why would you want to
> do something mad like that?
> 
> To me the organization is very much part of the control structure. It
> cannot be an invariant. Treating it like that destroys the whole notion
> of a hierarchy.

You and I don't really agree on this.  The disagreement is fine but
what I don't get is why this is such a big deal.  How would it break
the whole notion of a hierarchy?  A user isn't allowed to esacpe the
subhierarchy it's allowed in no matter what.  Whether organizational
operations supercedes configurations or not doesn't matter as long as
the user is confined under the right hierarchy.

Furthermore, in majority of use cases, organizational operations are
used to set up the hierarchy when starting up a group and then left
alone.  For stateful controller like memcg process migrations are
inherently expensive and intrusive, so the usage model isn't
arbitrary.  This is a corner case issue and doesn't really affect the
whole model.

> > e.g. if you set max memory lower than the
> > currently used, the config will be accepted and the controller will
> > keep trying to make the current state converge to the target state.
> > This is important as rejecting configuration can lead to chasing game
> > between configuration attempts and run-away resource consumption.
> 
> This is an entirely different issue; albeit with its own pitfalls, what
> if you put the max too low and you run into a never ending reclaim loop?
> Attempting to attain the unattainable.

That's an oom condition and memcg handles it accordingly.

> > Now, RR slices are the special case here because it's inherently
> > different from every other resource cgroup is concerned with. 
> 
> I don't think so, any controller which wants to carve up a fixed
> resource in non proportional ways is going to run into this.
> 
> Its just that you don't want this, but that doesn't render it less
> useful.

Well, of the resources that we handle right now, it is a special case
and a sucky one at that because it ties itself to regular cpu
controller which doesn't need that behavior.

> > It
> > simply doesn't fit into the same model that other resources follow.
> > There are several options we can try.
> > 
> > 1. Decouple RR slices from cpu controller.  This would be the best
> >route to follow.  RR slices need a hard allocator no matter what we
> >do.  There isn't much point in imposing hierarchical structure on
> >top of it.
> 
> The same is true of SCHED_DEADLINE, we hard divide a fixed amount. We've
> not currently exposed it to cgroups, but we want to eventually.
> 
> As to not having a hierarchy; you're the one destroying it by saying the
> organization should be decoupled from the controller.

I don't get this part.  How does making organization supercede
configuration destroy hierarchy?

> And, no a hierarchy still makes perfect sense, think of containers, they
> might not even see the parent.

The mode of configuration is different tho.  No matter what we do, if
we want to automate this sort of distribution with resource as limited
as realtime slices, it'll need a separate allocator which can carve
out resources on demand.  This can't be ratio-distributed or
soft-capped and having to tie this together with regular cpu
controller is annoying.

> > 3. Take compromise in the other direction - add exceptions to
> >organizational operations but clearly limit the failure modes.  We
> >prolly want to structure code in a way to enforce this.
> 
> I'm for failure modes as you should well now by know ;-)
> 
> I really think you're moving in the wrong direction with the whole
> cgroup stuff if you just want to willy nilly allow everything.

Well, let's agree to disagree on that one.  It's not about allowing
willy nilly everything but separating out the specification of intent
from the current state and you also saw how coupling the two tightly
messed up cpuset.  It can make configuration tedious enough to the
point where it becomes impractical to use under certain circumstances.

The thing is, allowing to specify configurations doesn't prevent the
user from enforcing stricter rules.  The current state is always
visible to the user and if it fails to converge, the user can take
whatever actions that it needs to take to remedy the situation.

> Also, who's the one doing a PID controller which will hard fail fork?
> How are you going to do away with can_attach() there? Surely you need to
> dis-allow another task joining when its at its maximum number of allowed
> PIDs, the same condition you're going to fail fork().

It allows migrations into already capped cgroup.  It just won't allow
new forks.  This isn't different from allowing limit to be lowered
below the current and we *do* want that because otherwise it becomes a
race between whoever is setting the config and whoever is consuming
the 

Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 11:54:31AM +0800, Zefan Li wrote:

> But I was wondering if we can change the default value of cpu.rt_runtime_us
> from 0 to -1? So by default the RT tasks can be attached to a newly-created
> cgroup without users having to make any configuration, and those tasks are
> confined by the parent cgroup, which is what we have with cfs bw control.
> This require some changes to the code, but I guess it's do-able?

Its tricky.

Imagine:

  root
 /\
A  B
   / \/ \
  a1 a2  b1 b2

Now if they all have -1, I cannot set a bw on any except the leaf nodes
([ab][12]). Because the sum of child bw must strictly be smaller or
equal to the parent bandwidth, and -1 if effective inf.

Similarly, if A has bw enabled I cannot create a new child with -1.
Because above.

Now you can kludge around some of this, for example you can make the
default depend on the parent setting etc.. But that's horribly
inconsistent.

So I really prefer not to go that way; if people use RR/FIFO they had
better bloody know what they're doing; which includes setting up the
system.

The whole RR/FIFO thing is so enormously broken (by definition; this
truly is unfixable) that you simply _cannot_ automate it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Mike.

On Mon, May 04, 2015 at 07:39:24AM +0200, Mike Galbraith wrote:
> > > Some degree of flexibility is provided so that you may disable some 
> > > controllers
> > > in a subtree. For example:
> > > 
> > > root  ---> child1
> > > (cpuset,memory,cpu)(cpuset,memory)
> > >   \
> > >\-> child2
> > >(cpu)
> > 
> > Whew, that's a relief.  Thanks.
> 
> But somehow I'm not feeling a whole lot better.
> 
> "May" means if you don't explicitly take some action to disable group
> scheduling, you get it (I don't care if I have an off button), but that

In the new interface, hierarchy setup and controller configuration are
two separate steps.  Creating subhierarchy doesn't enable controller
automatically and as long as specific controllers are concerned
nothing changes when subhierarchy is created and processes are moved
inbetween them.  If control over specific resources is necessary in a
given hierarchy, the matching controllers should be enabled
explicitly.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Mike Galbraith
On Tue, 2015-05-05 at 11:46 +0800, Zefan Li wrote:
> On 2015/5/4 22:09, Mike Galbraith wrote:
> > On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote:
> >> On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:
> >>
> >>> Some degree of flexibility is provided so that you may disable some 
> >>> controllers
> >>> in a subtree. For example:
> >>>
> >>> root  ---> child1
> >>> (cpuset,memory,cpu)(cpuset,memory)
> >>>   \
> >>>\-> child2
> >>>(cpu)
> >>
> >> Uhm, how does that work? Would a task their effective cgroup be the
> >> first parent that has a controller enabled?
> >>
> >> In particular, in your example, if T were part of child1, would its cpu
> >> controller be root?
> 
> correct.
> 
> > 
> > That's what I'd hope for.  I wanted to try that cgroup.subtree_control
> > gizmo to see for myself, but I don't have one, and probably won't get
> > one until I introduce systemd to my axe (again, it's a slow learner).
> > 
> 
> I'm testing in an environment without systemd.

Lucky you.

> You need to mount cgroup with a special option:
> 
>   # mount -t cgroup -o __DEVEL__sane_behavior xxx /where
> 
> If a cgroup controller has already been mounted without this option,
> you won't see it in the unified hierarchy, so firstly you need to
> delete all cgroups in it and umount it.

Yeah, I found the flag, and systemd is indeed in the way.  You already
verified what subtree_control does, so I needn't squabble with the vile
thing over cgroups possession... immediately anyway.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Mike.

On Mon, May 04, 2015 at 07:39:24AM +0200, Mike Galbraith wrote:
   Some degree of flexibility is provided so that you may disable some 
   controllers
   in a subtree. For example:
   
   root  --- child1
   (cpuset,memory,cpu)(cpuset,memory)
 \
  \- child2
  (cpu)
  
  Whew, that's a relief.  Thanks.
 
 But somehow I'm not feeling a whole lot better.
 
 May means if you don't explicitly take some action to disable group
 scheduling, you get it (I don't care if I have an off button), but that

In the new interface, hierarchy setup and controller configuration are
two separate steps.  Creating subhierarchy doesn't enable controller
automatically and as long as specific controllers are concerned
nothing changes when subhierarchy is created and processes are moved
inbetween them.  If control over specific resources is necessary in a
given hierarchy, the matching controllers should be enabled
explicitly.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Tue, May 05, 2015 at 04:10:49PM +0200, Peter Zijlstra wrote:
 Imagine:
 
 root
/\
   A  B
/ \/ \
   a1 a2  b1 b2
 
 Now if they all have -1, I cannot set a bw on any except the leaf nodes
 ([ab][12]). Because the sum of child bw must strictly be smaller or
 equal to the parent bandwidth, and -1 if effective inf.
 
 Similarly, if A has bw enabled I cannot create a new child with -1.
 Because above.
 
 Now you can kludge around some of this, for example you can make the
 default depend on the parent setting etc.. But that's horribly
 inconsistent.

I don't think we can kludge this.  For all other resources, we're
defining the limits that can't be crossed so nesting them w/ -1 by
default is fine.  RR slices are different it that we're really slicing
up and guaranteeing a portion of something finite, so unlimited by
default thing doesn't really work here.

 So I really prefer not to go that way; if people use RR/FIFO they had
 better bloody know what they're doing; which includes setting up the
 system.

The problem is that this is tied to the normal cpu controller.  Users
who don't have any intention of mucking with RT scheduling end up
being dragged into it.  Given the strict nature of RR slicing, I'm
don't even think it's actually useful to make the slicing
hierarchical.  From cgroup's POV, it'd be best if RR slicing can be
detached.

 The whole RR/FIFO thing is so enormously broken (by definition; this
 truly is unfixable) that you simply _cannot_ automate it.

Yeah, exactly.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 11:54:31AM +0800, Zefan Li wrote:

 But I was wondering if we can change the default value of cpu.rt_runtime_us
 from 0 to -1? So by default the RT tasks can be attached to a newly-created
 cgroup without users having to make any configuration, and those tasks are
 confined by the parent cgroup, which is what we have with cfs bw control.
 This require some changes to the code, but I guess it's do-able?

Its tricky.

Imagine:

  root
 /\
A  B
   / \/ \
  a1 a2  b1 b2

Now if they all have -1, I cannot set a bw on any except the leaf nodes
([ab][12]). Because the sum of child bw must strictly be smaller or
equal to the parent bandwidth, and -1 if effective inf.

Similarly, if A has bw enabled I cannot create a new child with -1.
Because above.

Now you can kludge around some of this, for example you can make the
default depend on the parent setting etc.. But that's horribly
inconsistent.

So I really prefer not to go that way; if people use RR/FIFO they had
better bloody know what they're doing; which includes setting up the
system.

The whole RR/FIFO thing is so enormously broken (by definition; this
truly is unfixable) that you simply _cannot_ automate it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Tue, May 05, 2015 at 05:19:49PM +0200, Peter Zijlstra wrote:
  I don't think we can kludge this.  For all other resources, we're
  defining the limits that can't be crossed so nesting them w/ -1 by
  default is fine.  RR slices are different it that we're really slicing
  up and guaranteeing a portion of something finite, so unlimited by
  default thing doesn't really work here.
 
 Note that you _could_ do the same thing with IO bandwidth; esp. with
 these modern no-seek-penalty devices this could make sense.

Yeah, maybe.  It currently is too unpredictable to do that (at least
from OS side w/ all the layering) but that is a possibility.

  The problem is that this is tied to the normal cpu controller.  Users
  who don't have any intention of mucking with RT scheduling end up
  being dragged into it.  Given the strict nature of RR slicing, I'm
  don't even think it's actually useful to make the slicing
  hierarchical.  From cgroup's POV, it'd be best if RR slicing can be
  detached.
 
 Like in the other mail; hierarchy still makes perfect sense for the
 container case.

We'd still need an on-demand arbitration mechanism across containers
no matter what we do which might as well take care of everything.  But
please see below.

   The whole RR/FIFO thing is so enormously broken (by definition; this
   truly is unfixable) that you simply _cannot_ automate it.
  
  Yeah, exactly.
 
 I don't think you're quite agreeing to the same reasons I am. My main
 objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that
 it does not in fact allow the OS to do what an OS _should_ do, namely
 resource arbitration and control.
 
 The whole rt-cgroup controller tries to somewhat contain that, but
 fundamentally once you use RR/FIFO you've given up your system to
 userspace control -- which btw is why its usually limited to root.
 
 SCHED_DEADLINE avoids all these problems, at the cost of a more complex
 setup.
 
 But the fact that both need fixed portions of a limited total does not
 in fact mean they're broken.

But that does make them pretty different from others.  What bothers me
the most about RR slices right now is that it's tightly coupled with
the rest of cpu controller while having a very different set of
characteristics.  Maybe this is something mandated by the underlying
structure and we have to live with it but it definitely isn't an ideal
situation.

What I don't want to happen is controllers failing migrations
willy-nilly for random reasons leaving users baffled, which we've
actually been doing unfortunately.  Maybe we need to deal with this
fixed resource arbitration as a separate class and allow them to fail
migration w/ -EBUSY.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 10:18:38AM -0400, Tejun Heo wrote:
  Now you can kludge around some of this, for example you can make the
  default depend on the parent setting etc.. But that's horribly
  inconsistent.
 
 I don't think we can kludge this.  For all other resources, we're
 defining the limits that can't be crossed so nesting them w/ -1 by
 default is fine.  RR slices are different it that we're really slicing
 up and guaranteeing a portion of something finite, so unlimited by
 default thing doesn't really work here.

Note that you _could_ do the same thing with IO bandwidth; esp. with
these modern no-seek-penalty devices this could make sense.


  So I really prefer not to go that way; if people use RR/FIFO they had
  better bloody know what they're doing; which includes setting up the
  system.
 
 The problem is that this is tied to the normal cpu controller.  Users
 who don't have any intention of mucking with RT scheduling end up
 being dragged into it.  Given the strict nature of RR slicing, I'm
 don't even think it's actually useful to make the slicing
 hierarchical.  From cgroup's POV, it'd be best if RR slicing can be
 detached.

Like in the other mail; hierarchy still makes perfect sense for the
container case.

  The whole RR/FIFO thing is so enormously broken (by definition; this
  truly is unfixable) that you simply _cannot_ automate it.
 
 Yeah, exactly.

I don't think you're quite agreeing to the same reasons I am. My main
objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that
it does not in fact allow the OS to do what an OS _should_ do, namely
resource arbitration and control.

The whole rt-cgroup controller tries to somewhat contain that, but
fundamentally once you use RR/FIFO you've given up your system to
userspace control -- which btw is why its usually limited to root.

SCHED_DEADLINE avoids all these problems, at the cost of a more complex
setup.

But the fact that both need fixed portions of a limited total does not
in fact mean they're broken.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote:
  I just realized we allow removing/adding controllers from/to cgroups
  while there are tasks in them, which isn't safe unless we eliminate all
  can_attach callbacks. We've done so for some cgroup subsystems, but
  there are still a few of them...
 
 You can't remove can_attach(), we must be able to disallow joining a
 cgroup.
 
 If that results in you not being able to change the cgroup setup with
 tasks in, so be it -- that seems like a sane restriction anyhow.

This is really an interface policy issue.  For all other controllers,
it's almost trivial to let organizational operations (setting up
hierarchies, moving processes around) overrule controller
configurations.  The main benefit of doing this is that this decouples
organizational operations from resource control.  Users can depend on
the fact that allowed organizational operations won't fail due to
specific controller configuration issues.

This also works well with controllers accepting target configurations
regardless of the current state and enforcing rules to converge to the
configured state instead.  e.g. if you set max memory lower than the
currently used, the config will be accepted and the controller will
keep trying to make the current state converge to the target state.
This is important as rejecting configuration can lead to chasing game
between configuration attempts and run-away resource consumption.

Now, RR slices are the special case here because it's inherently
different from every other resource cgroup is concerned with.  It
simply doesn't fit into the same model that other resources follow.
There are several options we can try.

1. Decouple RR slices from cpu controller.  This would be the best
   route to follow.  RR slices need a hard allocator no matter what we
   do.  There isn't much point in imposing hierarchical structure on
   top of it.

2. Implement special case behavior so that it can follow the same
   model.  e.g. resetting RR scheduling config when the effective cpu
   cgroup changes or carrying the amount of slice being consumed with
   the process being moved.  No matter how this is done, it's gonna be
   a clear compromise as we're forcing this into the model which
   doesn't quite fit it.  That said, given how RR slices are a special
   case to begin with, I think this can be acceptable.

3. Take compromise in the other direction - add exceptions to
   organizational operations but clearly limit the failure modes.  We
   prolly want to structure code in a way to enforce this.

4. If #1 can be done in time but not right now, simply disallow any
   RR/FIFO in !root cgroups on the unified hierarchy for now.

What do you think?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 12:13:35PM -0400, Tejun Heo wrote:
 Hello, Peter.
 
 On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote:
 ...
  But but but... that doesn't make any damn sense! Why would you want to
  do something mad like that?
  
  To me the organization is very much part of the control structure. It
  cannot be an invariant. Treating it like that destroys the whole notion
  of a hierarchy.
 
 You and I don't really agree on this.  The disagreement is fine but
 what I don't get is why this is such a big deal.  How would it break
 the whole notion of a hierarchy?  A user isn't allowed to esacpe the
 subhierarchy it's allowed in no matter what.  Whether organizational
 operations supercedes configurations or not doesn't matter as long as
 the user is confined under the right hierarchy.

I really don't get what you're saying there. If its not allowed to
'escape' there must be some equivalent of can_attach().

Otherwise you simply cannot reject the move.

 Furthermore, in majority of use cases, organizational operations are
 used to set up the hierarchy when starting up a group and then left
 alone.  For stateful controller like memcg process migrations are
 inherently expensive and intrusive, so the usage model isn't
 arbitrary.  This is a corner case issue and doesn't really affect the
 whole model.

Again, I don't follow, so why is can_attach() bad?

  I don't think so, any controller which wants to carve up a fixed
  resource in non proportional ways is going to run into this.
  
  Its just that you don't want this, but that doesn't render it less
  useful.
 
 Well, of the resources that we handle right now, it is a special case
 and a sucky one at that because it ties itself to regular cpu
 controller which doesn't need that behavior.

It doesn't 'tie' itself to the cpu controller, its a fundamental part of
the cpu controller. The cpu controller is about all computation time,
RR/FIFO is a very much part of that.

And RR/FIFO is extra special in that if you grant a process that it can
suck your machine dry of this time. This is why you must configure it.

People should not unknowingly let programs use RR/FIFO. Also what sorts
of 'problems' are people having because of this? What kind of
applications 'require' RR/FIFO on a normal desktop?

  As to not having a hierarchy; you're the one destroying it by saying the
  organization should be decoupled from the controller.
 
 I don't get this part.  How does making organization supercede
 configuration destroy hierarchy?

If you want to unconditionally allow task migration between groups, the
hierarchy doesn't actually mean anything.

You can't enforce hierarchical constraints. Which to me is the entire
point of having a hierarchy.

  And, no a hierarchy still makes perfect sense, think of containers, they
  might not even see the parent.
 
 The mode of configuration is different tho.  No matter what we do, if
 we want to automate this sort of distribution with resource as limited
 as realtime slices, it'll need a separate allocator which can carve
 out resources on demand.

But you don't want to automate, full stop.

 This can't be ratio-distributed or
 soft-capped and having to tie this together with regular cpu
 controller is annoying.

Welcome to actual world issues. Stop pretending this stuff is easy and
can be hidden from the user.

IF people want to use RR/FIFO they had better damn well know what
they're doing. There is not way around that. There's just too many
things that can go wrong with it.

If they don't want to deal with this problems, then tell them to go
away. Do _NOT_ pretend its easy and fudge it for them.

This on-demand carving thing you mention, that's a _MASSIVE_ fudge. Just
don't even go there.

  I really think you're moving in the wrong direction with the whole
  cgroup stuff if you just want to willy nilly allow everything.
 
 Well, let's agree to disagree on that one.  It's not about allowing
 willy nilly everything but separating out the specification of intent
 from the current state and you also saw how coupling the two tightly
 messed up cpuset.  It can make configuration tedious enough to the
 point where it becomes impractical to use under certain circumstances.

Well, no I didn't see how cpusets was messed up. You see that is where
we start to disagree.

The improvement I wanted to cpusets was to simply disallow hotplug when
there were tasks that could not go elsewhere.

 The thing is, allowing to specify configurations doesn't prevent the
 user from enforcing stricter rules.  The current state is always
 visible to the user and if it fails to converge, the user can take
 whatever actions that it needs to take to remedy the situation.

Right, so how about failing hotplug if there's (user) tasks pinned to a
cpu? That's clearly visible and the user can go fix it if he really
wants to do the unplug.

That's a very similar thing, but you've argued against it.

That said, this is not the point we're now arguing 

Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 10:41:04AM -0400, Tejun Heo wrote:
 Hello, Peter.
 
 On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote:
   I just realized we allow removing/adding controllers from/to cgroups
   while there are tasks in them, which isn't safe unless we eliminate all
   can_attach callbacks. We've done so for some cgroup subsystems, but
   there are still a few of them...
  
  You can't remove can_attach(), we must be able to disallow joining a
  cgroup.
  
  If that results in you not being able to change the cgroup setup with
  tasks in, so be it -- that seems like a sane restriction anyhow.
 
 This is really an interface policy issue.  For all other controllers,
 it's almost trivial to let organizational operations (setting up
 hierarchies, moving processes around) overrule controller
 configurations.  The main benefit of doing this is that this decouples
 organizational operations from resource control.  Users can depend on
 the fact that allowed organizational operations won't fail due to
 specific controller configuration issues.

But but but... that doesn't make any damn sense! Why would you want to
do something mad like that?

To me the organization is very much part of the control structure. It
cannot be an invariant. Treating it like that destroys the whole notion
of a hierarchy.

 This also works well with controllers accepting target configurations
 regardless of the current state and enforcing rules to converge to the
 configured state instead.

I think we had a long discussion on that which we never finished. I'm
not much for converging to a state. Either it can or it can not and you
hard fail.

With this soft lets just accept any old crap mentality you cannot
provide guarantees.

 e.g. if you set max memory lower than the
 currently used, the config will be accepted and the controller will
 keep trying to make the current state converge to the target state.
 This is important as rejecting configuration can lead to chasing game
 between configuration attempts and run-away resource consumption.

This is an entirely different issue; albeit with its own pitfalls, what
if you put the max too low and you run into a never ending reclaim loop?
Attempting to attain the unattainable.

 Now, RR slices are the special case here because it's inherently
 different from every other resource cgroup is concerned with. 

I don't think so, any controller which wants to carve up a fixed
resource in non proportional ways is going to run into this.

Its just that you don't want this, but that doesn't render it less
useful.

 It
 simply doesn't fit into the same model that other resources follow.
 There are several options we can try.
 
 1. Decouple RR slices from cpu controller.  This would be the best
route to follow.  RR slices need a hard allocator no matter what we
do.  There isn't much point in imposing hierarchical structure on
top of it.

The same is true of SCHED_DEADLINE, we hard divide a fixed amount. We've
not currently exposed it to cgroups, but we want to eventually.

As to not having a hierarchy; you're the one destroying it by saying the
organization should be decoupled from the controller.

And, no a hierarchy still makes perfect sense, think of containers, they
might not even see the parent.

 3. Take compromise in the other direction - add exceptions to
organizational operations but clearly limit the failure modes.  We
prolly want to structure code in a way to enforce this.

I'm for failure modes as you should well now by know ;-)

I really think you're moving in the wrong direction with the whole
cgroup stuff if you just want to willy nilly allow everything.

Also, who's the one doing a PID controller which will hard fail fork?
How are you going to do away with can_attach() there? Surely you need to
dis-allow another task joining when its at its maximum number of allowed
PIDs, the same condition you're going to fail fork().

So no; hard failure is good and desired. It allows guarantees, which is
a good and desired feature of control.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote:
...
 But but but... that doesn't make any damn sense! Why would you want to
 do something mad like that?
 
 To me the organization is very much part of the control structure. It
 cannot be an invariant. Treating it like that destroys the whole notion
 of a hierarchy.

You and I don't really agree on this.  The disagreement is fine but
what I don't get is why this is such a big deal.  How would it break
the whole notion of a hierarchy?  A user isn't allowed to esacpe the
subhierarchy it's allowed in no matter what.  Whether organizational
operations supercedes configurations or not doesn't matter as long as
the user is confined under the right hierarchy.

Furthermore, in majority of use cases, organizational operations are
used to set up the hierarchy when starting up a group and then left
alone.  For stateful controller like memcg process migrations are
inherently expensive and intrusive, so the usage model isn't
arbitrary.  This is a corner case issue and doesn't really affect the
whole model.

  e.g. if you set max memory lower than the
  currently used, the config will be accepted and the controller will
  keep trying to make the current state converge to the target state.
  This is important as rejecting configuration can lead to chasing game
  between configuration attempts and run-away resource consumption.
 
 This is an entirely different issue; albeit with its own pitfalls, what
 if you put the max too low and you run into a never ending reclaim loop?
 Attempting to attain the unattainable.

That's an oom condition and memcg handles it accordingly.

  Now, RR slices are the special case here because it's inherently
  different from every other resource cgroup is concerned with. 
 
 I don't think so, any controller which wants to carve up a fixed
 resource in non proportional ways is going to run into this.
 
 Its just that you don't want this, but that doesn't render it less
 useful.

Well, of the resources that we handle right now, it is a special case
and a sucky one at that because it ties itself to regular cpu
controller which doesn't need that behavior.

  It
  simply doesn't fit into the same model that other resources follow.
  There are several options we can try.
  
  1. Decouple RR slices from cpu controller.  This would be the best
 route to follow.  RR slices need a hard allocator no matter what we
 do.  There isn't much point in imposing hierarchical structure on
 top of it.
 
 The same is true of SCHED_DEADLINE, we hard divide a fixed amount. We've
 not currently exposed it to cgroups, but we want to eventually.
 
 As to not having a hierarchy; you're the one destroying it by saying the
 organization should be decoupled from the controller.

I don't get this part.  How does making organization supercede
configuration destroy hierarchy?

 And, no a hierarchy still makes perfect sense, think of containers, they
 might not even see the parent.

The mode of configuration is different tho.  No matter what we do, if
we want to automate this sort of distribution with resource as limited
as realtime slices, it'll need a separate allocator which can carve
out resources on demand.  This can't be ratio-distributed or
soft-capped and having to tie this together with regular cpu
controller is annoying.

  3. Take compromise in the other direction - add exceptions to
 organizational operations but clearly limit the failure modes.  We
 prolly want to structure code in a way to enforce this.
 
 I'm for failure modes as you should well now by know ;-)
 
 I really think you're moving in the wrong direction with the whole
 cgroup stuff if you just want to willy nilly allow everything.

Well, let's agree to disagree on that one.  It's not about allowing
willy nilly everything but separating out the specification of intent
from the current state and you also saw how coupling the two tightly
messed up cpuset.  It can make configuration tedious enough to the
point where it becomes impractical to use under certain circumstances.

The thing is, allowing to specify configurations doesn't prevent the
user from enforcing stricter rules.  The current state is always
visible to the user and if it fails to converge, the user can take
whatever actions that it needs to take to remedy the situation.

 Also, who's the one doing a PID controller which will hard fail fork?
 How are you going to do away with can_attach() there? Surely you need to
 dis-allow another task joining when its at its maximum number of allowed
 PIDs, the same condition you're going to fail fork().

It allows migrations into already capped cgroup.  It just won't allow
new forks.  This isn't different from allowing limit to be lowered
below the current and we *do* want that because otherwise it becomes a
race between whoever is setting the config and whoever is consuming
the resources.  You always wanna be able to say stop giving out
resources now.

 

Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Thomas Gleixner
On Tue, 5 May 2015, Peter Zijlstra wrote:
 On Tue, May 05, 2015 at 12:13:35PM -0400, Tejun Heo wrote:
  On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote:
   
   So no; hard failure is good and desired. It allows guarantees, which is
   a good and desired feature of control.
  
  Isn't that too sweeping a statement?  We want them in some places but
  not necessarily in all places.  The hard failures aren't going away.
  They're just localized to specific areas where they're easier to
  handle.
 
 Easier how? I'm really not seeing how any of this is making things
 easier for anybody.
 
 All I'm seeing is that you're making cgroups useless for people who want
 to guarantee things (eg. the realtime people).

I fully agree and after reading through this thread I really have to
say that this whole notion of relax the admission control and then try
to magically converge to the resource limits is horrible in all
aspects.

Hierarchies must have a strictly inherited and overall consistent
resource management and therefor resource limitation. Otherwise they
are just useless.

The idea of allowing overcommitment and magically converging to back
to the limits yells heuristics all over the place and we all know how
reliable heuristics are.

Tejun, you try to make the whole configuration and placement simpler
for the user, but all you achieve is that you act like all these
politicians who promise tax cuts and whatever and forget about them
once the elections are over. How is that going to make stuff simpler
for users/admins? Not at all.

Instead of failing hard at placement/configuration time they get
surprised by hard to understand fallout of magic convergence
heuristics. That's crap and no matter how you argue it stays crap.

As Peter said several times: hard failure is good and desired. It's a
very clear information on which people can act on. If the failures
modes are nilly-willy today, as you wrote somewhere, then we need to
fix that and make them consistent and understandable and not replace
them by half baken heuristics which postpone the failure to some point
where it is even less understandable.

If there are issues with run-away problems, i.e. upping a resource
limit which gets eaten up from the existing tasks before you can admit
a new one, then your magic convergence thing is again the wrong
answer. The right approach is:

  1) Up the limit and make a reservation at the same time
  2) Admit the new task and allow it to consume the reservation
  3) Set it effective

You can apply this to ALL sorts of resource controllers and you give
the user a very simple to understand mechanism to control and
configure his system.

 Are you really going to force us to abandon cgroups and invent yet
 another grouping thing?

Sigh no. I think cgroups can be fixed, if we just adhere to the basic
principles of hierarchical resource management and remove/reject all
magic we'll fix that for you nonsense.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Thomas.

On Tue, May 05, 2015 at 08:29:28PM +0200, Thomas Gleixner wrote:
 I fully agree and after reading through this thread I really have to
 say that this whole notion of relax the admission control and then try
 to magically converge to the resource limits is horrible in all
 aspects.

This comes down to controllers allowing limits to be configured
current usage.  We need to allow and define what happens in that
situation and moving a process into a full cgroup inherently follows
the same pattern albeit from the other direction.

 The idea of allowing overcommitment and magically converging to back
 to the limits yells heuristics all over the place and we all know how
 reliable heuristics are.

It's not magic heuristics.  This is a core part of normal operation.

 As Peter said several times: hard failure is good and desired. It's a
 very clear information on which people can act on. If the failures
 modes are nilly-willy today, as you wrote somewhere, then we need to
 fix that and make them consistent and understandable and not replace
 them by half baken heuristics which postpone the failure to some point
 where it is even less understandable.

There are no such magic heuristics because controllers need well
defined behaviors when current is above limit anyway and behave
exactly the same way no matter how that state is reached.  For
resources like RR slices, this doesn't work and that's why this is an
issue, so yeah this is the process of finding out what must be able to
fail.

 If there are issues with run-away problems, i.e. upping a resource
 limit which gets eaten up from the existing tasks before you can admit
 a new one, then your magic convergence thing is again the wrong
 answer. The right approach is:
 
   1) Up the limit and make a reservation at the same time
   2) Admit the new task and allow it to consume the reservation
   3) Set it effective

I don't really think this is a scenario we need to worry about.  If we
choose to fail migration, let's just fail it.  There's no point in
building a mechanism to work around malbehavior from its users.

  Are you really going to force us to abandon cgroups and invent yet
  another grouping thing?
 
 Sigh no. I think cgroups can be fixed, if we just adhere to the basic
 principles of hierarchical resource management and remove/reject all
 magic we'll fix that for you nonsense.

So, let's do -EBUSY for hard resource failures which have to be exact.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Peter Zijlstra
On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote:
 
 What I don't want to happen is controllers failing migrations
 willy-nilly for random reasons leaving users baffled, which we've
 actually been doing unfortunately.  Maybe we need to deal with this
 fixed resource arbitration as a separate class and allow them to fail
 migration w/ -EBUSY.

Ah, _that_ was the problem.

Which is something created by this co-mounting of controllers.

You could of course store the ss-id of the failing operation in
task_struct and have a file reporting the name of the ss-id.

That way, there is a simple way to find out which controller failed the
migrate.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, Peter.

On Tue, May 05, 2015 at 09:00:57PM +0200, Peter Zijlstra wrote:
 On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote:
  What I don't want to happen is controllers failing migrations
  willy-nilly for random reasons leaving users baffled, which we've
  actually been doing unfortunately.  Maybe we need to deal with this
  fixed resource arbitration as a separate class and allow them to fail
  migration w/ -EBUSY.
 
 Ah, _that_ was the problem.
 
 Which is something created by this co-mounting of controllers.

Yeah, partly, but also that it's an extra failure mode which isn't
necessary for most controllers.

 You could of course store the ss-id of the failing operation in
 task_struct and have a file reporting the name of the ss-id.
 
 That way, there is a simple way to find out which controller failed the
 migrate.

Given that the resources which can fail are very limited, I don't
think we need that right now as long as we limit and document the
possible failure cases clearly.  Hopefully, this won't devolve into
collection of arbitrary failures.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Mike Galbraith
On Tue, 2015-05-05 at 11:46 +0800, Zefan Li wrote:
 On 2015/5/4 22:09, Mike Galbraith wrote:
  On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote:
  On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:
 
  Some degree of flexibility is provided so that you may disable some 
  controllers
  in a subtree. For example:
 
  root  --- child1
  (cpuset,memory,cpu)(cpuset,memory)
\
 \- child2
 (cpu)
 
  Uhm, how does that work? Would a task their effective cgroup be the
  first parent that has a controller enabled?
 
  In particular, in your example, if T were part of child1, would its cpu
  controller be root?
 
 correct.
 
  
  That's what I'd hope for.  I wanted to try that cgroup.subtree_control
  gizmo to see for myself, but I don't have one, and probably won't get
  one until I introduce systemd to my axe (again, it's a slow learner).
  
 
 I'm testing in an environment without systemd.

Lucky you.

 You need to mount cgroup with a special option:
 
   # mount -t cgroup -o __DEVEL__sane_behavior xxx /where
 
 If a cgroup controller has already been mounted without this option,
 you won't see it in the unified hierarchy, so firstly you need to
 delete all cgroups in it and umount it.

Yeah, I found the flag, and systemd is indeed in the way.  You already
verified what subtree_control does, so I needn't squabble with the vile
thing over cgroups possession... immediately anyway.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-05 Thread Tejun Heo
Hello, again.

On Tue, May 05, 2015 at 06:50:06PM +0200, Peter Zijlstra wrote:
 I really don't get what you're saying there. If its not allowed to
 'escape' there must be some equivalent of can_attach().
 
 Otherwise you simply cannot reject the move.

A given user isn't allowed to move processes into a cgroup outside its
subhierarchy and the hierarchical resource control keeps the
subhierarchy under the limits no matter what the user does inside it.
Whether can_attach can fail or not is peripheral in this sense - if a
user can move processes into a cgroup outside its allowed scope, the
user can already escape regardless of the specifics of configuration.

Any user of cgroups should be confined to its scope and when it's
confined that way, the hierarchical limits are enforced no matter what
happens in its subhierarchy.

  Furthermore, in majority of use cases, organizational operations are
  used to set up the hierarchy when starting up a group and then left
  alone.  For stateful controller like memcg process migrations are
  inherently expensive and intrusive, so the usage model isn't
  arbitrary.  This is a corner case issue and doesn't really affect the
  whole model.
 
 Again, I don't follow, so why is can_attach() bad?

It's more like can_attach failures don't add much for other
controllers.  Please see below.

 People should not unknowingly let programs use RR/FIFO. Also what sorts
 of 'problems' are people having because of this? What kind of
 applications 'require' RR/FIFO on a normal desktop?

The cases I hear about are mostly audio applications which end up in
whatever default cgroups other applications are put in w/o an easy way
to configure the hierarchy for RR slices.  As I wrote way back, if
these can't be decoupled, whoever is setting up cpu cgroup hierarchies
will also have to take part in distributing realtime slices.

This might not necessarily be a bad thing.  It's just different from
everything else cgroups deal with at this point.

  I don't get this part.  How does making organization supercede
  configuration destroy hierarchy?
 
 If you want to unconditionally allow task migration between groups, the
 hierarchy doesn't actually mean anything.

 You can't enforce hierarchical constraints. Which to me is the entire
 point of having a hierarchy.

No, hierarchy still puts restrictions on who can do what where.
Whether organization operations supercede configurations or not
doens't affect this at all.  Again, if you can stow away processes out
of your domain, you're escaping the hierarchical constrasints all the
same.  Delegations need to scoped no matter what.

  This can't be ratio-distributed or
  soft-capped and having to tie this together with regular cpu
  controller is annoying.
 
 Welcome to actual world issues. Stop pretending this stuff is easy and
 can be hidden from the user.
 
 IF people want to use RR/FIFO they had better damn well know what
 they're doing. There is not way around that. There's just too many
 things that can go wrong with it.
 
 If they don't want to deal with this problems, then tell them to go
 away. Do _NOT_ pretend its easy and fudge it for them.
 
 This on-demand carving thing you mention, that's a _MASSIVE_ fudge. Just
 don't even go there.

How is on-demand allocation fudging?  You can do it manually or you
can have policies set up to allocate the specific resource.  This is
really beside the point tho.  What I was trying to say was that this
takes a different approach from other non-hard resources.

  Well, let's agree to disagree on that one.  It's not about allowing
  willy nilly everything but separating out the specification of intent
  from the current state and you also saw how coupling the two tightly
  messed up cpuset.  It can make configuration tedious enough to the
  point where it becomes impractical to use under certain circumstances.
 
 Well, no I didn't see how cpusets was messed up. You see that is where
 we start to disagree.

Yeah, seems that way.  Let's agree to disagree here.

 The improvement I wanted to cpusets was to simply disallow hotplug when
 there were tasks that could not go elsewhere.

Would that mean we're also gonna disallow hotunplug if some threads
are pinned to that cpu?  And the kernel would still be changing
configurations in an non-reversible way.  Again, how does that jive
with plain affinities?

 That said, this is not the point we're now arguing about; I want the
 hierarchy to actually mean something, and the only way to do that is to
 allow can_attach().
 
 Without can_attach() one cannot provide hierarchical constraints.

I don't think this is the point either.  The point is how to deal with
hard resources that can't be permissive by default.

   Also, who's the one doing a PID controller which will hard fail fork?
   How are you going to do away with can_attach() there? Surely you need to
   dis-allow another task joining when its at its maximum number of allowed
   PIDs, the same condition you're going 

Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Zefan Li
On 2015/5/4 20:37, Peter Zijlstra wrote:
> On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:
> 
>> Some degree of flexibility is provided so that you may disable some 
>> controllers
>> in a subtree. For example:
>>
>> root  ---> child1
>> (cpuset,memory,cpu)(cpuset,memory)
>>   \
>>\-> child2
>>(cpu)
> 
> Uhm, how does that work? Would a task their effective cgroup be the
> first parent that has a controller enabled?
> 
> In particular, in your example, if T were part of child1, would its cpu
> controller be root?
> 
>> I just realized we allow removing/adding controllers from/to cgroups
>> while there are tasks in them, which isn't safe unless we eliminate all
>> can_attach callbacks. We've done so for some cgroup subsystems, but
>> there are still a few of them...
> 
> You can't remove can_attach(), we must be able to disallow joining a
> cgroup.
> 
> If that results in you not being able to change the cgroup setup with
> tasks in, so be it -- that seems like a sane restriction anyhow.
> 

I wasn't thinking about removing can_attach() before I noticed this issue.

But I was wondering if we can change the default value of cpu.rt_runtime_us
from 0 to -1? So by default the RT tasks can be attached to a newly-created
cgroup without users having to make any configuration, and those tasks are
confined by the parent cgroup, which is what we have with cfs bw control.
This require some changes to the code, but I guess it's do-able?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Zefan Li
On 2015/5/4 22:09, Mike Galbraith wrote:
> On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote:
>> On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:
>>
>>> Some degree of flexibility is provided so that you may disable some 
>>> controllers
>>> in a subtree. For example:
>>>
>>> root  ---> child1
>>> (cpuset,memory,cpu)(cpuset,memory)
>>>   \
>>>\-> child2
>>>(cpu)
>>
>> Uhm, how does that work? Would a task their effective cgroup be the
>> first parent that has a controller enabled?
>>
>> In particular, in your example, if T were part of child1, would its cpu
>> controller be root?

correct.

> 
> That's what I'd hope for.  I wanted to try that cgroup.subtree_control
> gizmo to see for myself, but I don't have one, and probably won't get
> one until I introduce systemd to my axe (again, it's a slow learner).
> 

I'm testing in an environment without systemd.

You need to mount cgroup with a special option:

  # mount -t cgroup -o __DEVEL__sane_behavior xxx /where

If a cgroup controller has already been mounted without this option,
you won't see it in the unified hierarchy, so firstly you need to
delete all cgroups in it and umount it.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Mike Galbraith
On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote:
> On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:
> 
> > Some degree of flexibility is provided so that you may disable some 
> > controllers
> > in a subtree. For example:
> >
> > root  ---> child1
> > (cpuset,memory,cpu)(cpuset,memory)
> >   \
> >\-> child2
> >(cpu)
> 
> Uhm, how does that work? Would a task their effective cgroup be the
> first parent that has a controller enabled?
> 
> In particular, in your example, if T were part of child1, would its cpu
> controller be root?

That's what I'd hope for.  I wanted to try that cgroup.subtree_control
gizmo to see for myself, but I don't have one, and probably won't get
one until I introduce systemd to my axe (again, it's a slow learner).

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Peter Zijlstra
On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:

> Some degree of flexibility is provided so that you may disable some 
> controllers
> in a subtree. For example:
>
> root  ---> child1
> (cpuset,memory,cpu)(cpuset,memory)
>   \
>\-> child2
>(cpu)

Uhm, how does that work? Would a task their effective cgroup be the
first parent that has a controller enabled?

In particular, in your example, if T were part of child1, would its cpu
controller be root?

> I just realized we allow removing/adding controllers from/to cgroups
> while there are tasks in them, which isn't safe unless we eliminate all
> can_attach callbacks. We've done so for some cgroup subsystems, but
> there are still a few of them...

You can't remove can_attach(), we must be able to disallow joining a
cgroup.

If that results in you not being able to change the cgroup setup with
tasks in, so be it -- that seems like a sane restriction anyhow.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Mike Galbraith
On Mon, 2015-05-04 at 17:11 +0800, Zefan Li wrote:
> >>> Some degree of flexibility is provided so that you may disable some 
> >>> controllers
> >>> in a subtree. For example:
> >>>
> >>> root  ---> child1
> >>> (cpuset,memory,cpu)(cpuset,memory)
> >>>   \
> >>>\-> child2
> >>>(cpu)
> >>
> >> Whew, that's a relief.  Thanks.
> > 
> > But somehow I'm not feeling a whole lot better.
> > 
> > "May" means if you don't explicitly take some action to disable group
> > scheduling, you get it (I don't care if I have an off button), but that
> > would also seemingly mean that we would then have rt tasks in taskgroups
> > with no bandwidth allocated, ie you have to make group scheduling for rt
> > tasks meaningless until a bandwidth appeared, and to make bandwidth
> > appear, you'd have to stop the world, distribute, continue, no?
> > 
> > The current "just say no" seems a lot more sensible.
> > 
> 
> I just realized we allow removing/adding controllers from/to cgroups
> while there are tasks in them, which isn't safe unless we eliminate all
> can_attach callbacks. We've done so for some cgroup subsystems, but
> there are still a few of them...

I was pondering the future (or so I thought), but seems it turned into
the past while I wasn't looking.  Oh well, you found a bug anyway.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Zefan Li
>>> Some degree of flexibility is provided so that you may disable some 
>>> controllers
>>> in a subtree. For example:
>>>
>>> root  ---> child1
>>> (cpuset,memory,cpu)(cpuset,memory)
>>>   \
>>>\-> child2
>>>(cpu)
>>
>> Whew, that's a relief.  Thanks.
> 
> But somehow I'm not feeling a whole lot better.
> 
> "May" means if you don't explicitly take some action to disable group
> scheduling, you get it (I don't care if I have an off button), but that
> would also seemingly mean that we would then have rt tasks in taskgroups
> with no bandwidth allocated, ie you have to make group scheduling for rt
> tasks meaningless until a bandwidth appeared, and to make bandwidth
> appear, you'd have to stop the world, distribute, continue, no?
> 
> The current "just say no" seems a lot more sensible.
> 

I just realized we allow removing/adding controllers from/to cgroups
while there are tasks in them, which isn't safe unless we eliminate all
can_attach callbacks. We've done so for some cgroup subsystems, but
there are still a few of them...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Zefan Li
 Some degree of flexibility is provided so that you may disable some 
 controllers
 in a subtree. For example:

 root  --- child1
 (cpuset,memory,cpu)(cpuset,memory)
   \
\- child2
(cpu)

 Whew, that's a relief.  Thanks.
 
 But somehow I'm not feeling a whole lot better.
 
 May means if you don't explicitly take some action to disable group
 scheduling, you get it (I don't care if I have an off button), but that
 would also seemingly mean that we would then have rt tasks in taskgroups
 with no bandwidth allocated, ie you have to make group scheduling for rt
 tasks meaningless until a bandwidth appeared, and to make bandwidth
 appear, you'd have to stop the world, distribute, continue, no?
 
 The current just say no seems a lot more sensible.
 

I just realized we allow removing/adding controllers from/to cgroups
while there are tasks in them, which isn't safe unless we eliminate all
can_attach callbacks. We've done so for some cgroup subsystems, but
there are still a few of them...

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Mike Galbraith
On Mon, 2015-05-04 at 17:11 +0800, Zefan Li wrote:
  Some degree of flexibility is provided so that you may disable some 
  controllers
  in a subtree. For example:
 
  root  --- child1
  (cpuset,memory,cpu)(cpuset,memory)
\
 \- child2
 (cpu)
 
  Whew, that's a relief.  Thanks.
  
  But somehow I'm not feeling a whole lot better.
  
  May means if you don't explicitly take some action to disable group
  scheduling, you get it (I don't care if I have an off button), but that
  would also seemingly mean that we would then have rt tasks in taskgroups
  with no bandwidth allocated, ie you have to make group scheduling for rt
  tasks meaningless until a bandwidth appeared, and to make bandwidth
  appear, you'd have to stop the world, distribute, continue, no?
  
  The current just say no seems a lot more sensible.
  
 
 I just realized we allow removing/adding controllers from/to cgroups
 while there are tasks in them, which isn't safe unless we eliminate all
 can_attach callbacks. We've done so for some cgroup subsystems, but
 there are still a few of them...

I was pondering the future (or so I thought), but seems it turned into
the past while I wasn't looking.  Oh well, you found a bug anyway.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Peter Zijlstra
On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:

 Some degree of flexibility is provided so that you may disable some 
 controllers
 in a subtree. For example:

 root  --- child1
 (cpuset,memory,cpu)(cpuset,memory)
   \
\- child2
(cpu)

Uhm, how does that work? Would a task their effective cgroup be the
first parent that has a controller enabled?

In particular, in your example, if T were part of child1, would its cpu
controller be root?

 I just realized we allow removing/adding controllers from/to cgroups
 while there are tasks in them, which isn't safe unless we eliminate all
 can_attach callbacks. We've done so for some cgroup subsystems, but
 there are still a few of them...

You can't remove can_attach(), we must be able to disallow joining a
cgroup.

If that results in you not being able to change the cgroup setup with
tasks in, so be it -- that seems like a sane restriction anyhow.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Mike Galbraith
On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote:
 On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:
 
  Some degree of flexibility is provided so that you may disable some 
  controllers
  in a subtree. For example:
 
  root  --- child1
  (cpuset,memory,cpu)(cpuset,memory)
\
 \- child2
 (cpu)
 
 Uhm, how does that work? Would a task their effective cgroup be the
 first parent that has a controller enabled?
 
 In particular, in your example, if T were part of child1, would its cpu
 controller be root?

That's what I'd hope for.  I wanted to try that cgroup.subtree_control
gizmo to see for myself, but I don't have one, and probably won't get
one until I introduce systemd to my axe (again, it's a slow learner).

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Zefan Li
On 2015/5/4 22:09, Mike Galbraith wrote:
 On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote:
 On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:

 Some degree of flexibility is provided so that you may disable some 
 controllers
 in a subtree. For example:

 root  --- child1
 (cpuset,memory,cpu)(cpuset,memory)
   \
\- child2
(cpu)

 Uhm, how does that work? Would a task their effective cgroup be the
 first parent that has a controller enabled?

 In particular, in your example, if T were part of child1, would its cpu
 controller be root?

correct.

 
 That's what I'd hope for.  I wanted to try that cgroup.subtree_control
 gizmo to see for myself, but I don't have one, and probably won't get
 one until I introduce systemd to my axe (again, it's a slow learner).
 

I'm testing in an environment without systemd.

You need to mount cgroup with a special option:

  # mount -t cgroup -o __DEVEL__sane_behavior xxx /where

If a cgroup controller has already been mounted without this option,
you won't see it in the unified hierarchy, so firstly you need to
delete all cgroups in it and umount it.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-04 Thread Zefan Li
On 2015/5/4 20:37, Peter Zijlstra wrote:
 On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote:
 
 Some degree of flexibility is provided so that you may disable some 
 controllers
 in a subtree. For example:

 root  --- child1
 (cpuset,memory,cpu)(cpuset,memory)
   \
\- child2
(cpu)
 
 Uhm, how does that work? Would a task their effective cgroup be the
 first parent that has a controller enabled?
 
 In particular, in your example, if T were part of child1, would its cpu
 controller be root?
 
 I just realized we allow removing/adding controllers from/to cgroups
 while there are tasks in them, which isn't safe unless we eliminate all
 can_attach callbacks. We've done so for some cgroup subsystems, but
 there are still a few of them...
 
 You can't remove can_attach(), we must be able to disallow joining a
 cgroup.
 
 If that results in you not being able to change the cgroup setup with
 tasks in, so be it -- that seems like a sane restriction anyhow.
 

I wasn't thinking about removing can_attach() before I noticed this issue.

But I was wondering if we can change the default value of cpu.rt_runtime_us
from 0 to -1? So by default the RT tasks can be attached to a newly-created
cgroup without users having to make any configuration, and those tasks are
confined by the parent cgroup, which is what we have with cfs bw control.
This require some changes to the code, but I guess it's do-able?

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-03 Thread Mike Galbraith
On Mon, 2015-05-04 at 07:10 +0200, Mike Galbraith wrote:
> On Mon, 2015-05-04 at 12:39 +0800, Zefan Li wrote:
> 
> > >> We are moving toward unified hierarchy where all the cgroup controllers
> > >> are bound together, so it would make cgroups easier to use if we have 
> > >> less
> > >> restrictions on attaching tasks between cgroups.
> > > 
> > > Forcing group scheduling overhead on users if they want cpuset or memory
> > > cgroup functionality would be far from wonderful.  Am I interpreting the
> > > implications of this unification/binding properly?
> > > 
> > > (I hope not, surely the plan is not to utterly _destroy_ cgroup utility)
> > > 
> > 
> > Some degree of flexibility is provided so that you may disable some 
> > controllers
> > in a subtree. For example:
> > 
> > root  ---> child1
> > (cpuset,memory,cpu)(cpuset,memory)
> >   \
> >\-> child2
> >(cpu)
> 
> Whew, that's a relief.  Thanks.

But somehow I'm not feeling a whole lot better.

"May" means if you don't explicitly take some action to disable group
scheduling, you get it (I don't care if I have an off button), but that
would also seemingly mean that we would then have rt tasks in taskgroups
with no bandwidth allocated, ie you have to make group scheduling for rt
tasks meaningless until a bandwidth appeared, and to make bandwidth
appear, you'd have to stop the world, distribute, continue, no?

The current "just say no" seems a lot more sensible.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-03 Thread Mike Galbraith
On Mon, 2015-05-04 at 12:39 +0800, Zefan Li wrote:

> >> We are moving toward unified hierarchy where all the cgroup controllers
> >> are bound together, so it would make cgroups easier to use if we have less
> >> restrictions on attaching tasks between cgroups.
> > 
> > Forcing group scheduling overhead on users if they want cpuset or memory
> > cgroup functionality would be far from wonderful.  Am I interpreting the
> > implications of this unification/binding properly?
> > 
> > (I hope not, surely the plan is not to utterly _destroy_ cgroup utility)
> > 
> 
> Some degree of flexibility is provided so that you may disable some 
> controllers
> in a subtree. For example:
> 
> root  ---> child1
> (cpuset,memory,cpu)(cpuset,memory)
>   \
>\-> child2
>(cpu)

Whew, that's a relief.  Thanks.

-Mike


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-03 Thread Zefan Li
On 2015/5/4 11:13, Mike Galbraith wrote:
> On Mon, 2015-05-04 at 08:54 +0800, Zefan Li wrote:
>> It's allowed to promote a task from normal to realtime after it has been
>> attached to a non-root cgroup, but it will fail if the attaching happens
>> after it has become realtime. I don't see how this restriction is useful.
> 
> In the CONFIG_RT_GROUP_SCHED case, promotion will fail is there is no
> bandwidth allocated.
> 

Right. I forgot to mention this patch affects !CONFIG_RT_GROUP_SCHED only,
though it should be obvious by reading the change.

>> We are moving toward unified hierarchy where all the cgroup controllers
>> are bound together, so it would make cgroups easier to use if we have less
>> restrictions on attaching tasks between cgroups.
> 
> Forcing group scheduling overhead on users if they want cpuset or memory
> cgroup functionality would be far from wonderful.  Am I interpreting the
> implications of this unification/binding properly?
> 
> (I hope not, surely the plan is not to utterly _destroy_ cgroup utility)
> 

Some degree of flexibility is provided so that you may disable some controllers
in a subtree. For example:

root  ---> child1
(cpuset,memory,cpu)(cpuset,memory)
  \
   \-> child2
   (cpu)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-03 Thread Mike Galbraith
On Mon, 2015-05-04 at 08:54 +0800, Zefan Li wrote:
> It's allowed to promote a task from normal to realtime after it has been
> attached to a non-root cgroup, but it will fail if the attaching happens
> after it has become realtime. I don't see how this restriction is useful.

In the CONFIG_RT_GROUP_SCHED case, promotion will fail is there is no
bandwidth allocated.

> We are moving toward unified hierarchy where all the cgroup controllers
> are bound together, so it would make cgroups easier to use if we have less
> restrictions on attaching tasks between cgroups.

Forcing group scheduling overhead on users if they want cpuset or memory
cgroup functionality would be far from wonderful.  Am I interpreting the
implications of this unification/binding properly?

(I hope not, surely the plan is not to utterly _destroy_ cgroup utility)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-03 Thread Mike Galbraith
On Mon, 2015-05-04 at 12:39 +0800, Zefan Li wrote:

  We are moving toward unified hierarchy where all the cgroup controllers
  are bound together, so it would make cgroups easier to use if we have less
  restrictions on attaching tasks between cgroups.
  
  Forcing group scheduling overhead on users if they want cpuset or memory
  cgroup functionality would be far from wonderful.  Am I interpreting the
  implications of this unification/binding properly?
  
  (I hope not, surely the plan is not to utterly _destroy_ cgroup utility)
  
 
 Some degree of flexibility is provided so that you may disable some 
 controllers
 in a subtree. For example:
 
 root  --- child1
 (cpuset,memory,cpu)(cpuset,memory)
   \
\- child2
(cpu)

Whew, that's a relief.  Thanks.

-Mike


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-03 Thread Mike Galbraith
On Mon, 2015-05-04 at 07:10 +0200, Mike Galbraith wrote:
 On Mon, 2015-05-04 at 12:39 +0800, Zefan Li wrote:
 
   We are moving toward unified hierarchy where all the cgroup controllers
   are bound together, so it would make cgroups easier to use if we have 
   less
   restrictions on attaching tasks between cgroups.
   
   Forcing group scheduling overhead on users if they want cpuset or memory
   cgroup functionality would be far from wonderful.  Am I interpreting the
   implications of this unification/binding properly?
   
   (I hope not, surely the plan is not to utterly _destroy_ cgroup utility)
   
  
  Some degree of flexibility is provided so that you may disable some 
  controllers
  in a subtree. For example:
  
  root  --- child1
  (cpuset,memory,cpu)(cpuset,memory)
\
 \- child2
 (cpu)
 
 Whew, that's a relief.  Thanks.

But somehow I'm not feeling a whole lot better.

May means if you don't explicitly take some action to disable group
scheduling, you get it (I don't care if I have an off button), but that
would also seemingly mean that we would then have rt tasks in taskgroups
with no bandwidth allocated, ie you have to make group scheduling for rt
tasks meaningless until a bandwidth appeared, and to make bandwidth
appear, you'd have to stop the world, distribute, continue, no?

The current just say no seems a lot more sensible.

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-03 Thread Mike Galbraith
On Mon, 2015-05-04 at 08:54 +0800, Zefan Li wrote:
 It's allowed to promote a task from normal to realtime after it has been
 attached to a non-root cgroup, but it will fail if the attaching happens
 after it has become realtime. I don't see how this restriction is useful.

In the CONFIG_RT_GROUP_SCHED case, promotion will fail is there is no
bandwidth allocated.

 We are moving toward unified hierarchy where all the cgroup controllers
 are bound together, so it would make cgroups easier to use if we have less
 restrictions on attaching tasks between cgroups.

Forcing group scheduling overhead on users if they want cpuset or memory
cgroup functionality would be far from wonderful.  Am I interpreting the
implications of this unification/binding properly?

(I hope not, surely the plan is not to utterly _destroy_ cgroup utility)

-Mike

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()

2015-05-03 Thread Zefan Li
On 2015/5/4 11:13, Mike Galbraith wrote:
 On Mon, 2015-05-04 at 08:54 +0800, Zefan Li wrote:
 It's allowed to promote a task from normal to realtime after it has been
 attached to a non-root cgroup, but it will fail if the attaching happens
 after it has become realtime. I don't see how this restriction is useful.
 
 In the CONFIG_RT_GROUP_SCHED case, promotion will fail is there is no
 bandwidth allocated.
 

Right. I forgot to mention this patch affects !CONFIG_RT_GROUP_SCHED only,
though it should be obvious by reading the change.

 We are moving toward unified hierarchy where all the cgroup controllers
 are bound together, so it would make cgroups easier to use if we have less
 restrictions on attaching tasks between cgroups.
 
 Forcing group scheduling overhead on users if they want cpuset or memory
 cgroup functionality would be far from wonderful.  Am I interpreting the
 implications of this unification/binding properly?
 
 (I hope not, surely the plan is not to utterly _destroy_ cgroup utility)
 

Some degree of flexibility is provided so that you may disable some controllers
in a subtree. For example:

root  --- child1
(cpuset,memory,cpu)(cpuset,memory)
  \
   \- child2
   (cpu)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/