Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, 5 May 2015, Tejun Heo wrote: > On Tue, May 05, 2015 at 08:29:28PM +0200, Thomas Gleixner wrote: > > As Peter said several times: hard failure is good and desired. It's a > > very clear information on which people can act on. If the failures > > modes are nilly-willy today, as you wrote somewhere, then we need to > > fix that and make them consistent and understandable and not replace > > them by half baken heuristics which postpone the failure to some point > > where it is even less understandable. > > There are no such magic heuristics because controllers need well > defined behaviors when current is above limit anyway and behave > exactly the same way no matter how that state is reached. For How would something go above limit in the first place if your resource management is done proper? If a group has a resource limit, then it is not allowed to exceed that resource. So any attempt to use more resources must fail, period. There is no way to go above the limit. If you try to lower the limits of an existing group below the level which is already used, then this limit restriction attempt must fail. That's the basic principle of resource management. And if you try to avoid them, then you have a massive design failure. It's that simple. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 03:06:03PM -0400, Tejun Heo wrote: > Hello, Peter. > > On Tue, May 05, 2015 at 09:00:57PM +0200, Peter Zijlstra wrote: > > On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote: > > > What I don't want to happen is controllers failing migrations > > > willy-nilly for random reasons leaving users baffled, which we've > > > actually been doing unfortunately. Maybe we need to deal with this > > > fixed resource arbitration as a separate class and allow them to fail > > > migration w/ -EBUSY. > > > > Ah, _that_ was the problem. > > > > Which is something created by this co-mounting of controllers. > > Yeah, partly, but also that it's an extra failure mode which isn't > necessary for most controllers. I can agree with reducing failure modes, but we should not do it at the cost of functionality. > > You could of course store the ss-id of the failing operation in > > task_struct and have a file reporting the name of the ss-id. > > > > That way, there is a simple way to find out which controller failed the > > migrate. > > Given that the resources which can fail are very limited, I don't > think we need that right now as long as we limit and document the > possible failure cases clearly. Hopefully, this won't devolve into > collection of arbitrary failures. Right, but something like that would be fairly trivial to implement and would give immediate resolution. For example: $ echo 123 > /cgroups/monkey/business/tasks -EBUSY $ cat /cgroups/monkey/business/errno cpu:-EBUSY (in fact, for a trivial implementation it doesn't matter which cgroup/errno you cat) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 03:06:03PM -0400, Tejun Heo wrote: Hello, Peter. On Tue, May 05, 2015 at 09:00:57PM +0200, Peter Zijlstra wrote: On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote: What I don't want to happen is controllers failing migrations willy-nilly for random reasons leaving users baffled, which we've actually been doing unfortunately. Maybe we need to deal with this fixed resource arbitration as a separate class and allow them to fail migration w/ -EBUSY. Ah, _that_ was the problem. Which is something created by this co-mounting of controllers. Yeah, partly, but also that it's an extra failure mode which isn't necessary for most controllers. I can agree with reducing failure modes, but we should not do it at the cost of functionality. You could of course store the ss-id of the failing operation in task_struct and have a file reporting the name of the ss-id. That way, there is a simple way to find out which controller failed the migrate. Given that the resources which can fail are very limited, I don't think we need that right now as long as we limit and document the possible failure cases clearly. Hopefully, this won't devolve into collection of arbitrary failures. Right, but something like that would be fairly trivial to implement and would give immediate resolution. For example: $ echo 123 /cgroups/monkey/business/tasks -EBUSY $ cat /cgroups/monkey/business/errno cpu:-EBUSY (in fact, for a trivial implementation it doesn't matter which cgroup/errno you cat) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, 5 May 2015, Tejun Heo wrote: On Tue, May 05, 2015 at 08:29:28PM +0200, Thomas Gleixner wrote: As Peter said several times: hard failure is good and desired. It's a very clear information on which people can act on. If the failures modes are nilly-willy today, as you wrote somewhere, then we need to fix that and make them consistent and understandable and not replace them by half baken heuristics which postpone the failure to some point where it is even less understandable. There are no such magic heuristics because controllers need well defined behaviors when current is above limit anyway and behave exactly the same way no matter how that state is reached. For How would something go above limit in the first place if your resource management is done proper? If a group has a resource limit, then it is not allowed to exceed that resource. So any attempt to use more resources must fail, period. There is no way to go above the limit. If you try to lower the limits of an existing group below the level which is already used, then this limit restriction attempt must fail. That's the basic principle of resource management. And if you try to avoid them, then you have a massive design failure. It's that simple. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Tue, May 05, 2015 at 09:00:57PM +0200, Peter Zijlstra wrote: > On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote: > > What I don't want to happen is controllers failing migrations > > willy-nilly for random reasons leaving users baffled, which we've > > actually been doing unfortunately. Maybe we need to deal with this > > fixed resource arbitration as a separate class and allow them to fail > > migration w/ -EBUSY. > > Ah, _that_ was the problem. > > Which is something created by this co-mounting of controllers. Yeah, partly, but also that it's an extra failure mode which isn't necessary for most controllers. > You could of course store the ss-id of the failing operation in > task_struct and have a file reporting the name of the ss-id. > > That way, there is a simple way to find out which controller failed the > migrate. Given that the resources which can fail are very limited, I don't think we need that right now as long as we limit and document the possible failure cases clearly. Hopefully, this won't devolve into collection of arbitrary failures. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote: > > What I don't want to happen is controllers failing migrations > willy-nilly for random reasons leaving users baffled, which we've > actually been doing unfortunately. Maybe we need to deal with this > fixed resource arbitration as a separate class and allow them to fail > migration w/ -EBUSY. Ah, _that_ was the problem. Which is something created by this co-mounting of controllers. You could of course store the ss-id of the failing operation in task_struct and have a file reporting the name of the ss-id. That way, there is a simple way to find out which controller failed the migrate. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Thomas. On Tue, May 05, 2015 at 08:29:28PM +0200, Thomas Gleixner wrote: > I fully agree and after reading through this thread I really have to > say that this whole notion of relax the admission control and then try > to magically converge to the resource limits is horrible in all > aspects. This comes down to controllers allowing limits to be configured current usage. We need to allow and define what happens in that situation and moving a process into a full cgroup inherently follows the same pattern albeit from the other direction. > The idea of allowing overcommitment and magically converging to back > to the limits yells heuristics all over the place and we all know how > reliable heuristics are. It's not magic heuristics. This is a core part of normal operation. > As Peter said several times: hard failure is good and desired. It's a > very clear information on which people can act on. If the failures > modes are nilly-willy today, as you wrote somewhere, then we need to > fix that and make them consistent and understandable and not replace > them by half baken heuristics which postpone the failure to some point > where it is even less understandable. There are no such magic heuristics because controllers need well defined behaviors when current is above limit anyway and behave exactly the same way no matter how that state is reached. For resources like RR slices, this doesn't work and that's why this is an issue, so yeah this is the process of finding out what must be able to fail. > If there are issues with run-away problems, i.e. upping a resource > limit which gets eaten up from the existing tasks before you can admit > a new one, then your magic convergence thing is again the wrong > answer. The right approach is: > > 1) Up the limit and make a reservation at the same time > 2) Admit the new task and allow it to consume the reservation > 3) Set it effective I don't really think this is a scenario we need to worry about. If we choose to fail migration, let's just fail it. There's no point in building a mechanism to work around malbehavior from its users. > > Are you really going to force us to abandon cgroups and invent yet > > another grouping thing? > > Sigh no. I think cgroups can be fixed, if we just adhere to the basic > principles of hierarchical resource management and remove/reject all > magic "we'll fix that for you" nonsense. So, let's do -EBUSY for hard resource failures which have to be exact. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, again. On Tue, May 05, 2015 at 06:50:06PM +0200, Peter Zijlstra wrote: > I really don't get what you're saying there. If its not allowed to > 'escape' there must be some equivalent of can_attach(). > > Otherwise you simply cannot reject the move. A given user isn't allowed to move processes into a cgroup outside its subhierarchy and the hierarchical resource control keeps the subhierarchy under the limits no matter what the user does inside it. Whether can_attach can fail or not is peripheral in this sense - if a user can move processes into a cgroup outside its allowed scope, the user can already escape regardless of the specifics of configuration. Any user of cgroups should be confined to its scope and when it's confined that way, the hierarchical limits are enforced no matter what happens in its subhierarchy. > > Furthermore, in majority of use cases, organizational operations are > > used to set up the hierarchy when starting up a group and then left > > alone. For stateful controller like memcg process migrations are > > inherently expensive and intrusive, so the usage model isn't > > arbitrary. This is a corner case issue and doesn't really affect the > > whole model. > > Again, I don't follow, so why is can_attach() bad? It's more like can_attach failures don't add much for other controllers. Please see below. > People should not unknowingly let programs use RR/FIFO. Also what sorts > of 'problems' are people having because of this? What kind of > applications 'require' RR/FIFO on a normal desktop? The cases I hear about are mostly audio applications which end up in whatever default cgroups other applications are put in w/o an easy way to configure the hierarchy for RR slices. As I wrote way back, if these can't be decoupled, whoever is setting up cpu cgroup hierarchies will also have to take part in distributing realtime slices. This might not necessarily be a bad thing. It's just different from everything else cgroups deal with at this point. > > I don't get this part. How does making organization supercede > > configuration destroy hierarchy? > > If you want to unconditionally allow task migration between groups, the > hierarchy doesn't actually mean anything. > > You can't enforce hierarchical constraints. Which to me is the entire > point of having a hierarchy. No, hierarchy still puts restrictions on who can do what where. Whether organization operations supercede configurations or not doens't affect this at all. Again, if you can stow away processes out of your domain, you're escaping the hierarchical constrasints all the same. Delegations need to scoped no matter what. > > This can't be ratio-distributed or > > soft-capped and having to tie this together with regular cpu > > controller is annoying. > > Welcome to actual world issues. Stop pretending this stuff is easy and > can be hidden from the user. > > IF people want to use RR/FIFO they had better damn well know what > they're doing. There is not way around that. There's just too many > things that can go wrong with it. > > If they don't want to deal with this problems, then tell them to go > away. Do _NOT_ pretend its easy and fudge it for them. > > This on-demand carving thing you mention, that's a _MASSIVE_ fudge. Just > don't even go there. How is on-demand allocation fudging? You can do it manually or you can have policies set up to allocate the specific resource. This is really beside the point tho. What I was trying to say was that this takes a different approach from other non-hard resources. > > Well, let's agree to disagree on that one. It's not about allowing > > willy nilly everything but separating out the specification of intent > > from the current state and you also saw how coupling the two tightly > > messed up cpuset. It can make configuration tedious enough to the > > point where it becomes impractical to use under certain circumstances. > > Well, no I didn't see how cpusets was messed up. You see that is where > we start to disagree. Yeah, seems that way. Let's agree to disagree here. > The improvement I wanted to cpusets was to simply disallow hotplug when > there were tasks that could not go elsewhere. Would that mean we're also gonna disallow hotunplug if some threads are pinned to that cpu? And the kernel would still be changing configurations in an non-reversible way. Again, how does that jive with plain affinities? > That said, this is not the point we're now arguing about; I want the > hierarchy to actually mean something, and the only way to do that is to > allow can_attach(). > > Without can_attach() one cannot provide hierarchical constraints. I don't think this is the point either. The point is how to deal with hard resources that can't be permissive by default. > > > Also, who's the one doing a PID controller which will hard fail fork? > > > How are you going to do away with can_attach() there? Surely you need to > > > dis-allow another task joining when
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, 5 May 2015, Peter Zijlstra wrote: > On Tue, May 05, 2015 at 12:13:35PM -0400, Tejun Heo wrote: > > On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote: > > > > > > So no; hard failure is good and desired. It allows guarantees, which is > > > a good and desired feature of control. > > > > Isn't that too sweeping a statement? We want them in some places but > > not necessarily in all places. The hard failures aren't going away. > > They're just localized to specific areas where they're easier to > > handle. > > Easier how? I'm really not seeing how any of this is making things > easier for anybody. > > All I'm seeing is that you're making cgroups useless for people who want > to guarantee things (eg. the realtime people). I fully agree and after reading through this thread I really have to say that this whole notion of relax the admission control and then try to magically converge to the resource limits is horrible in all aspects. Hierarchies must have a strictly inherited and overall consistent resource management and therefor resource limitation. Otherwise they are just useless. The idea of allowing overcommitment and magically converging to back to the limits yells heuristics all over the place and we all know how reliable heuristics are. Tejun, you try to make the whole configuration and placement simpler for the user, but all you achieve is that you act like all these politicians who promise tax cuts and whatever and forget about them once the elections are over. How is that going to make stuff simpler for users/admins? Not at all. Instead of failing hard at placement/configuration time they get surprised by hard to understand fallout of magic convergence heuristics. That's crap and no matter how you argue it stays crap. As Peter said several times: hard failure is good and desired. It's a very clear information on which people can act on. If the failures modes are nilly-willy today, as you wrote somewhere, then we need to fix that and make them consistent and understandable and not replace them by half baken heuristics which postpone the failure to some point where it is even less understandable. If there are issues with run-away problems, i.e. upping a resource limit which gets eaten up from the existing tasks before you can admit a new one, then your magic convergence thing is again the wrong answer. The right approach is: 1) Up the limit and make a reservation at the same time 2) Admit the new task and allow it to consume the reservation 3) Set it effective You can apply this to ALL sorts of resource controllers and you give the user a very simple to understand mechanism to control and configure his system. > Are you really going to force us to abandon cgroups and invent yet > another grouping thing? Sigh no. I think cgroups can be fixed, if we just adhere to the basic principles of hierarchical resource management and remove/reject all magic "we'll fix that for you" nonsense. Thanks, tglx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Tue, May 05, 2015 at 04:10:49PM +0200, Peter Zijlstra wrote: > Imagine: > > root >/\ > A B >/ \/ \ > a1 a2 b1 b2 > > Now if they all have -1, I cannot set a bw on any except the leaf nodes > ([ab][12]). Because the sum of child bw must strictly be smaller or > equal to the parent bandwidth, and -1 if effective inf. > > Similarly, if A has bw enabled I cannot create a new child with -1. > Because above. > > Now you can kludge around some of this, for example you can make the > default depend on the parent setting etc.. But that's horribly > inconsistent. I don't think we can kludge this. For all other resources, we're defining the limits that can't be crossed so nesting them w/ -1 by default is fine. RR slices are different it that we're really slicing up and guaranteeing a portion of something finite, so unlimited by default thing doesn't really work here. > So I really prefer not to go that way; if people use RR/FIFO they had > better bloody know what they're doing; which includes setting up the > system. The problem is that this is tied to the normal cpu controller. Users who don't have any intention of mucking with RT scheduling end up being dragged into it. Given the strict nature of RR slicing, I'm don't even think it's actually useful to make the slicing hierarchical. From cgroup's POV, it'd be best if RR slicing can be detached. > The whole RR/FIFO thing is so enormously broken (by definition; this > truly is unfixable) that you simply _cannot_ automate it. Yeah, exactly. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote: > > I just realized we allow removing/adding controllers from/to cgroups > > while there are tasks in them, which isn't safe unless we eliminate all > > can_attach callbacks. We've done so for some cgroup subsystems, but > > there are still a few of them... > > You can't remove can_attach(), we must be able to disallow joining a > cgroup. > > If that results in you not being able to change the cgroup setup with > tasks in, so be it -- that seems like a sane restriction anyhow. This is really an interface policy issue. For all other controllers, it's almost trivial to let organizational operations (setting up hierarchies, moving processes around) overrule controller configurations. The main benefit of doing this is that this decouples organizational operations from resource control. Users can depend on the fact that allowed organizational operations won't fail due to specific controller configuration issues. This also works well with controllers accepting target configurations regardless of the current state and enforcing rules to converge to the configured state instead. e.g. if you set max memory lower than the currently used, the config will be accepted and the controller will keep trying to make the current state converge to the target state. This is important as rejecting configuration can lead to chasing game between configuration attempts and run-away resource consumption. Now, RR slices are the special case here because it's inherently different from every other resource cgroup is concerned with. It simply doesn't fit into the same model that other resources follow. There are several options we can try. 1. Decouple RR slices from cpu controller. This would be the best route to follow. RR slices need a hard allocator no matter what we do. There isn't much point in imposing hierarchical structure on top of it. 2. Implement special case behavior so that it can follow the same model. e.g. resetting RR scheduling config when the effective cpu cgroup changes or carrying the amount of slice being consumed with the process being moved. No matter how this is done, it's gonna be a clear compromise as we're forcing this into the model which doesn't quite fit it. That said, given how RR slices are a special case to begin with, I think this can be acceptable. 3. Take compromise in the other direction - add exceptions to organizational operations but clearly limit the failure modes. We prolly want to structure code in a way to enforce this. 4. If #1 can be done in time but not right now, simply disallow any RR/FIFO in !root cgroups on the unified hierarchy for now. What do you think? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 10:41:04AM -0400, Tejun Heo wrote: > Hello, Peter. > > On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote: > > > I just realized we allow removing/adding controllers from/to cgroups > > > while there are tasks in them, which isn't safe unless we eliminate all > > > can_attach callbacks. We've done so for some cgroup subsystems, but > > > there are still a few of them... > > > > You can't remove can_attach(), we must be able to disallow joining a > > cgroup. > > > > If that results in you not being able to change the cgroup setup with > > tasks in, so be it -- that seems like a sane restriction anyhow. > > This is really an interface policy issue. For all other controllers, > it's almost trivial to let organizational operations (setting up > hierarchies, moving processes around) overrule controller > configurations. The main benefit of doing this is that this decouples > organizational operations from resource control. Users can depend on > the fact that allowed organizational operations won't fail due to > specific controller configuration issues. But but but... that doesn't make any damn sense! Why would you want to do something mad like that? To me the organization is very much part of the control structure. It cannot be an invariant. Treating it like that destroys the whole notion of a hierarchy. > This also works well with controllers accepting target configurations > regardless of the current state and enforcing rules to converge to the > configured state instead. I think we had a long discussion on that which we never finished. I'm not much for converging to a state. Either it can or it can not and you hard fail. With this soft lets just accept any old crap mentality you cannot provide guarantees. > e.g. if you set max memory lower than the > currently used, the config will be accepted and the controller will > keep trying to make the current state converge to the target state. > This is important as rejecting configuration can lead to chasing game > between configuration attempts and run-away resource consumption. This is an entirely different issue; albeit with its own pitfalls, what if you put the max too low and you run into a never ending reclaim loop? Attempting to attain the unattainable. > Now, RR slices are the special case here because it's inherently > different from every other resource cgroup is concerned with. I don't think so, any controller which wants to carve up a fixed resource in non proportional ways is going to run into this. Its just that you don't want this, but that doesn't render it less useful. > It > simply doesn't fit into the same model that other resources follow. > There are several options we can try. > > 1. Decouple RR slices from cpu controller. This would be the best >route to follow. RR slices need a hard allocator no matter what we >do. There isn't much point in imposing hierarchical structure on >top of it. The same is true of SCHED_DEADLINE, we hard divide a fixed amount. We've not currently exposed it to cgroups, but we want to eventually. As to not having a hierarchy; you're the one destroying it by saying the organization should be decoupled from the controller. And, no a hierarchy still makes perfect sense, think of containers, they might not even see the parent. > 3. Take compromise in the other direction - add exceptions to >organizational operations but clearly limit the failure modes. We >prolly want to structure code in a way to enforce this. I'm for failure modes as you should well now by know ;-) I really think you're moving in the wrong direction with the whole cgroup stuff if you just want to willy nilly allow everything. Also, who's the one doing a PID controller which will hard fail fork? How are you going to do away with can_attach() there? Surely you need to dis-allow another task joining when its at its maximum number of allowed PIDs, the same condition you're going to fail fork(). So no; hard failure is good and desired. It allows guarantees, which is a good and desired feature of control. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 10:18:38AM -0400, Tejun Heo wrote: > > Now you can kludge around some of this, for example you can make the > > default depend on the parent setting etc.. But that's horribly > > inconsistent. > > I don't think we can kludge this. For all other resources, we're > defining the limits that can't be crossed so nesting them w/ -1 by > default is fine. RR slices are different it that we're really slicing > up and guaranteeing a portion of something finite, so unlimited by > default thing doesn't really work here. Note that you _could_ do the same thing with IO bandwidth; esp. with these modern no-seek-penalty devices this could make sense. > > So I really prefer not to go that way; if people use RR/FIFO they had > > better bloody know what they're doing; which includes setting up the > > system. > > The problem is that this is tied to the normal cpu controller. Users > who don't have any intention of mucking with RT scheduling end up > being dragged into it. Given the strict nature of RR slicing, I'm > don't even think it's actually useful to make the slicing > hierarchical. From cgroup's POV, it'd be best if RR slicing can be > detached. Like in the other mail; hierarchy still makes perfect sense for the container case. > > The whole RR/FIFO thing is so enormously broken (by definition; this > > truly is unfixable) that you simply _cannot_ automate it. > > Yeah, exactly. I don't think you're quite agreeing to the same reasons I am. My main objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that it does not in fact allow the OS to do what an OS _should_ do, namely resource arbitration and control. The whole rt-cgroup controller tries to somewhat contain that, but fundamentally once you use RR/FIFO you've given up your system to userspace control -- which btw is why its usually limited to root. SCHED_DEADLINE avoids all these problems, at the cost of a more complex setup. But the fact that both need fixed portions of a limited total does not in fact mean they're broken. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 12:13:35PM -0400, Tejun Heo wrote: > Hello, Peter. > > On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote: > ... > > But but but... that doesn't make any damn sense! Why would you want to > > do something mad like that? > > > > To me the organization is very much part of the control structure. It > > cannot be an invariant. Treating it like that destroys the whole notion > > of a hierarchy. > > You and I don't really agree on this. The disagreement is fine but > what I don't get is why this is such a big deal. How would it break > the whole notion of a hierarchy? A user isn't allowed to esacpe the > subhierarchy it's allowed in no matter what. Whether organizational > operations supercedes configurations or not doesn't matter as long as > the user is confined under the right hierarchy. I really don't get what you're saying there. If its not allowed to 'escape' there must be some equivalent of can_attach(). Otherwise you simply cannot reject the move. > Furthermore, in majority of use cases, organizational operations are > used to set up the hierarchy when starting up a group and then left > alone. For stateful controller like memcg process migrations are > inherently expensive and intrusive, so the usage model isn't > arbitrary. This is a corner case issue and doesn't really affect the > whole model. Again, I don't follow, so why is can_attach() bad? > > I don't think so, any controller which wants to carve up a fixed > > resource in non proportional ways is going to run into this. > > > > Its just that you don't want this, but that doesn't render it less > > useful. > > Well, of the resources that we handle right now, it is a special case > and a sucky one at that because it ties itself to regular cpu > controller which doesn't need that behavior. It doesn't 'tie' itself to the cpu controller, its a fundamental part of the cpu controller. The cpu controller is about all computation time, RR/FIFO is a very much part of that. And RR/FIFO is extra special in that if you grant a process that it can suck your machine dry of this time. This is why you must configure it. People should not unknowingly let programs use RR/FIFO. Also what sorts of 'problems' are people having because of this? What kind of applications 'require' RR/FIFO on a normal desktop? > > As to not having a hierarchy; you're the one destroying it by saying the > > organization should be decoupled from the controller. > > I don't get this part. How does making organization supercede > configuration destroy hierarchy? If you want to unconditionally allow task migration between groups, the hierarchy doesn't actually mean anything. You can't enforce hierarchical constraints. Which to me is the entire point of having a hierarchy. > > And, no a hierarchy still makes perfect sense, think of containers, they > > might not even see the parent. > > The mode of configuration is different tho. No matter what we do, if > we want to automate this sort of distribution with resource as limited > as realtime slices, it'll need a separate allocator which can carve > out resources on demand. But you don't want to automate, full stop. > This can't be ratio-distributed or > soft-capped and having to tie this together with regular cpu > controller is annoying. Welcome to actual world issues. Stop pretending this stuff is easy and can be hidden from the user. IF people want to use RR/FIFO they had better damn well know what they're doing. There is not way around that. There's just too many things that can go wrong with it. If they don't want to deal with this problems, then tell them to go away. Do _NOT_ pretend its easy and fudge it for them. This on-demand carving thing you mention, that's a _MASSIVE_ fudge. Just don't even go there. > > I really think you're moving in the wrong direction with the whole > > cgroup stuff if you just want to willy nilly allow everything. > > Well, let's agree to disagree on that one. It's not about allowing > willy nilly everything but separating out the specification of intent > from the current state and you also saw how coupling the two tightly > messed up cpuset. It can make configuration tedious enough to the > point where it becomes impractical to use under certain circumstances. Well, no I didn't see how cpusets was messed up. You see that is where we start to disagree. The improvement I wanted to cpusets was to simply disallow hotplug when there were tasks that could not go elsewhere. > The thing is, allowing to specify configurations doesn't prevent the > user from enforcing stricter rules. The current state is always > visible to the user and if it fails to converge, the user can take > whatever actions that it needs to take to remedy the situation. Right, so how about failing hotplug if there's (user) tasks pinned to a cpu? That's clearly visible and the user can go fix it if he really wants to do the unplug. That's a very similar thing, but
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Tue, May 05, 2015 at 05:19:49PM +0200, Peter Zijlstra wrote: > > I don't think we can kludge this. For all other resources, we're > > defining the limits that can't be crossed so nesting them w/ -1 by > > default is fine. RR slices are different it that we're really slicing > > up and guaranteeing a portion of something finite, so unlimited by > > default thing doesn't really work here. > > Note that you _could_ do the same thing with IO bandwidth; esp. with > these modern no-seek-penalty devices this could make sense. Yeah, maybe. It currently is too unpredictable to do that (at least from OS side w/ all the layering) but that is a possibility. > > The problem is that this is tied to the normal cpu controller. Users > > who don't have any intention of mucking with RT scheduling end up > > being dragged into it. Given the strict nature of RR slicing, I'm > > don't even think it's actually useful to make the slicing > > hierarchical. From cgroup's POV, it'd be best if RR slicing can be > > detached. > > Like in the other mail; hierarchy still makes perfect sense for the > container case. We'd still need an on-demand arbitration mechanism across containers no matter what we do which might as well take care of everything. But please see below. > > > The whole RR/FIFO thing is so enormously broken (by definition; this > > > truly is unfixable) that you simply _cannot_ automate it. > > > > Yeah, exactly. > > I don't think you're quite agreeing to the same reasons I am. My main > objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that > it does not in fact allow the OS to do what an OS _should_ do, namely > resource arbitration and control. > > The whole rt-cgroup controller tries to somewhat contain that, but > fundamentally once you use RR/FIFO you've given up your system to > userspace control -- which btw is why its usually limited to root. > > SCHED_DEADLINE avoids all these problems, at the cost of a more complex > setup. > > But the fact that both need fixed portions of a limited total does not > in fact mean they're broken. But that does make them pretty different from others. What bothers me the most about RR slices right now is that it's tightly coupled with the rest of cpu controller while having a very different set of characteristics. Maybe this is something mandated by the underlying structure and we have to live with it but it definitely isn't an ideal situation. What I don't want to happen is controllers failing migrations willy-nilly for random reasons leaving users baffled, which we've actually been doing unfortunately. Maybe we need to deal with this fixed resource arbitration as a separate class and allow them to fail migration w/ -EBUSY. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote: ... > But but but... that doesn't make any damn sense! Why would you want to > do something mad like that? > > To me the organization is very much part of the control structure. It > cannot be an invariant. Treating it like that destroys the whole notion > of a hierarchy. You and I don't really agree on this. The disagreement is fine but what I don't get is why this is such a big deal. How would it break the whole notion of a hierarchy? A user isn't allowed to esacpe the subhierarchy it's allowed in no matter what. Whether organizational operations supercedes configurations or not doesn't matter as long as the user is confined under the right hierarchy. Furthermore, in majority of use cases, organizational operations are used to set up the hierarchy when starting up a group and then left alone. For stateful controller like memcg process migrations are inherently expensive and intrusive, so the usage model isn't arbitrary. This is a corner case issue and doesn't really affect the whole model. > > e.g. if you set max memory lower than the > > currently used, the config will be accepted and the controller will > > keep trying to make the current state converge to the target state. > > This is important as rejecting configuration can lead to chasing game > > between configuration attempts and run-away resource consumption. > > This is an entirely different issue; albeit with its own pitfalls, what > if you put the max too low and you run into a never ending reclaim loop? > Attempting to attain the unattainable. That's an oom condition and memcg handles it accordingly. > > Now, RR slices are the special case here because it's inherently > > different from every other resource cgroup is concerned with. > > I don't think so, any controller which wants to carve up a fixed > resource in non proportional ways is going to run into this. > > Its just that you don't want this, but that doesn't render it less > useful. Well, of the resources that we handle right now, it is a special case and a sucky one at that because it ties itself to regular cpu controller which doesn't need that behavior. > > It > > simply doesn't fit into the same model that other resources follow. > > There are several options we can try. > > > > 1. Decouple RR slices from cpu controller. This would be the best > >route to follow. RR slices need a hard allocator no matter what we > >do. There isn't much point in imposing hierarchical structure on > >top of it. > > The same is true of SCHED_DEADLINE, we hard divide a fixed amount. We've > not currently exposed it to cgroups, but we want to eventually. > > As to not having a hierarchy; you're the one destroying it by saying the > organization should be decoupled from the controller. I don't get this part. How does making organization supercede configuration destroy hierarchy? > And, no a hierarchy still makes perfect sense, think of containers, they > might not even see the parent. The mode of configuration is different tho. No matter what we do, if we want to automate this sort of distribution with resource as limited as realtime slices, it'll need a separate allocator which can carve out resources on demand. This can't be ratio-distributed or soft-capped and having to tie this together with regular cpu controller is annoying. > > 3. Take compromise in the other direction - add exceptions to > >organizational operations but clearly limit the failure modes. We > >prolly want to structure code in a way to enforce this. > > I'm for failure modes as you should well now by know ;-) > > I really think you're moving in the wrong direction with the whole > cgroup stuff if you just want to willy nilly allow everything. Well, let's agree to disagree on that one. It's not about allowing willy nilly everything but separating out the specification of intent from the current state and you also saw how coupling the two tightly messed up cpuset. It can make configuration tedious enough to the point where it becomes impractical to use under certain circumstances. The thing is, allowing to specify configurations doesn't prevent the user from enforcing stricter rules. The current state is always visible to the user and if it fails to converge, the user can take whatever actions that it needs to take to remedy the situation. > Also, who's the one doing a PID controller which will hard fail fork? > How are you going to do away with can_attach() there? Surely you need to > dis-allow another task joining when its at its maximum number of allowed > PIDs, the same condition you're going to fail fork(). It allows migrations into already capped cgroup. It just won't allow new forks. This isn't different from allowing limit to be lowered below the current and we *do* want that because otherwise it becomes a race between whoever is setting the config and whoever is consuming the
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 11:54:31AM +0800, Zefan Li wrote: > But I was wondering if we can change the default value of cpu.rt_runtime_us > from 0 to -1? So by default the RT tasks can be attached to a newly-created > cgroup without users having to make any configuration, and those tasks are > confined by the parent cgroup, which is what we have with cfs bw control. > This require some changes to the code, but I guess it's do-able? Its tricky. Imagine: root /\ A B / \/ \ a1 a2 b1 b2 Now if they all have -1, I cannot set a bw on any except the leaf nodes ([ab][12]). Because the sum of child bw must strictly be smaller or equal to the parent bandwidth, and -1 if effective inf. Similarly, if A has bw enabled I cannot create a new child with -1. Because above. Now you can kludge around some of this, for example you can make the default depend on the parent setting etc.. But that's horribly inconsistent. So I really prefer not to go that way; if people use RR/FIFO they had better bloody know what they're doing; which includes setting up the system. The whole RR/FIFO thing is so enormously broken (by definition; this truly is unfixable) that you simply _cannot_ automate it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Mike. On Mon, May 04, 2015 at 07:39:24AM +0200, Mike Galbraith wrote: > > > Some degree of flexibility is provided so that you may disable some > > > controllers > > > in a subtree. For example: > > > > > > root ---> child1 > > > (cpuset,memory,cpu)(cpuset,memory) > > > \ > > >\-> child2 > > >(cpu) > > > > Whew, that's a relief. Thanks. > > But somehow I'm not feeling a whole lot better. > > "May" means if you don't explicitly take some action to disable group > scheduling, you get it (I don't care if I have an off button), but that In the new interface, hierarchy setup and controller configuration are two separate steps. Creating subhierarchy doesn't enable controller automatically and as long as specific controllers are concerned nothing changes when subhierarchy is created and processes are moved inbetween them. If control over specific resources is necessary in a given hierarchy, the matching controllers should be enabled explicitly. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, 2015-05-05 at 11:46 +0800, Zefan Li wrote: > On 2015/5/4 22:09, Mike Galbraith wrote: > > On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote: > >> On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: > >> > >>> Some degree of flexibility is provided so that you may disable some > >>> controllers > >>> in a subtree. For example: > >>> > >>> root ---> child1 > >>> (cpuset,memory,cpu)(cpuset,memory) > >>> \ > >>>\-> child2 > >>>(cpu) > >> > >> Uhm, how does that work? Would a task their effective cgroup be the > >> first parent that has a controller enabled? > >> > >> In particular, in your example, if T were part of child1, would its cpu > >> controller be root? > > correct. > > > > > That's what I'd hope for. I wanted to try that cgroup.subtree_control > > gizmo to see for myself, but I don't have one, and probably won't get > > one until I introduce systemd to my axe (again, it's a slow learner). > > > > I'm testing in an environment without systemd. Lucky you. > You need to mount cgroup with a special option: > > # mount -t cgroup -o __DEVEL__sane_behavior xxx /where > > If a cgroup controller has already been mounted without this option, > you won't see it in the unified hierarchy, so firstly you need to > delete all cgroups in it and umount it. Yeah, I found the flag, and systemd is indeed in the way. You already verified what subtree_control does, so I needn't squabble with the vile thing over cgroups possession... immediately anyway. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Mike. On Mon, May 04, 2015 at 07:39:24AM +0200, Mike Galbraith wrote: Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Whew, that's a relief. Thanks. But somehow I'm not feeling a whole lot better. May means if you don't explicitly take some action to disable group scheduling, you get it (I don't care if I have an off button), but that In the new interface, hierarchy setup and controller configuration are two separate steps. Creating subhierarchy doesn't enable controller automatically and as long as specific controllers are concerned nothing changes when subhierarchy is created and processes are moved inbetween them. If control over specific resources is necessary in a given hierarchy, the matching controllers should be enabled explicitly. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Tue, May 05, 2015 at 04:10:49PM +0200, Peter Zijlstra wrote: Imagine: root /\ A B / \/ \ a1 a2 b1 b2 Now if they all have -1, I cannot set a bw on any except the leaf nodes ([ab][12]). Because the sum of child bw must strictly be smaller or equal to the parent bandwidth, and -1 if effective inf. Similarly, if A has bw enabled I cannot create a new child with -1. Because above. Now you can kludge around some of this, for example you can make the default depend on the parent setting etc.. But that's horribly inconsistent. I don't think we can kludge this. For all other resources, we're defining the limits that can't be crossed so nesting them w/ -1 by default is fine. RR slices are different it that we're really slicing up and guaranteeing a portion of something finite, so unlimited by default thing doesn't really work here. So I really prefer not to go that way; if people use RR/FIFO they had better bloody know what they're doing; which includes setting up the system. The problem is that this is tied to the normal cpu controller. Users who don't have any intention of mucking with RT scheduling end up being dragged into it. Given the strict nature of RR slicing, I'm don't even think it's actually useful to make the slicing hierarchical. From cgroup's POV, it'd be best if RR slicing can be detached. The whole RR/FIFO thing is so enormously broken (by definition; this truly is unfixable) that you simply _cannot_ automate it. Yeah, exactly. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 11:54:31AM +0800, Zefan Li wrote: But I was wondering if we can change the default value of cpu.rt_runtime_us from 0 to -1? So by default the RT tasks can be attached to a newly-created cgroup without users having to make any configuration, and those tasks are confined by the parent cgroup, which is what we have with cfs bw control. This require some changes to the code, but I guess it's do-able? Its tricky. Imagine: root /\ A B / \/ \ a1 a2 b1 b2 Now if they all have -1, I cannot set a bw on any except the leaf nodes ([ab][12]). Because the sum of child bw must strictly be smaller or equal to the parent bandwidth, and -1 if effective inf. Similarly, if A has bw enabled I cannot create a new child with -1. Because above. Now you can kludge around some of this, for example you can make the default depend on the parent setting etc.. But that's horribly inconsistent. So I really prefer not to go that way; if people use RR/FIFO they had better bloody know what they're doing; which includes setting up the system. The whole RR/FIFO thing is so enormously broken (by definition; this truly is unfixable) that you simply _cannot_ automate it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Tue, May 05, 2015 at 05:19:49PM +0200, Peter Zijlstra wrote: I don't think we can kludge this. For all other resources, we're defining the limits that can't be crossed so nesting them w/ -1 by default is fine. RR slices are different it that we're really slicing up and guaranteeing a portion of something finite, so unlimited by default thing doesn't really work here. Note that you _could_ do the same thing with IO bandwidth; esp. with these modern no-seek-penalty devices this could make sense. Yeah, maybe. It currently is too unpredictable to do that (at least from OS side w/ all the layering) but that is a possibility. The problem is that this is tied to the normal cpu controller. Users who don't have any intention of mucking with RT scheduling end up being dragged into it. Given the strict nature of RR slicing, I'm don't even think it's actually useful to make the slicing hierarchical. From cgroup's POV, it'd be best if RR slicing can be detached. Like in the other mail; hierarchy still makes perfect sense for the container case. We'd still need an on-demand arbitration mechanism across containers no matter what we do which might as well take care of everything. But please see below. The whole RR/FIFO thing is so enormously broken (by definition; this truly is unfixable) that you simply _cannot_ automate it. Yeah, exactly. I don't think you're quite agreeing to the same reasons I am. My main objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that it does not in fact allow the OS to do what an OS _should_ do, namely resource arbitration and control. The whole rt-cgroup controller tries to somewhat contain that, but fundamentally once you use RR/FIFO you've given up your system to userspace control -- which btw is why its usually limited to root. SCHED_DEADLINE avoids all these problems, at the cost of a more complex setup. But the fact that both need fixed portions of a limited total does not in fact mean they're broken. But that does make them pretty different from others. What bothers me the most about RR slices right now is that it's tightly coupled with the rest of cpu controller while having a very different set of characteristics. Maybe this is something mandated by the underlying structure and we have to live with it but it definitely isn't an ideal situation. What I don't want to happen is controllers failing migrations willy-nilly for random reasons leaving users baffled, which we've actually been doing unfortunately. Maybe we need to deal with this fixed resource arbitration as a separate class and allow them to fail migration w/ -EBUSY. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 10:18:38AM -0400, Tejun Heo wrote: Now you can kludge around some of this, for example you can make the default depend on the parent setting etc.. But that's horribly inconsistent. I don't think we can kludge this. For all other resources, we're defining the limits that can't be crossed so nesting them w/ -1 by default is fine. RR slices are different it that we're really slicing up and guaranteeing a portion of something finite, so unlimited by default thing doesn't really work here. Note that you _could_ do the same thing with IO bandwidth; esp. with these modern no-seek-penalty devices this could make sense. So I really prefer not to go that way; if people use RR/FIFO they had better bloody know what they're doing; which includes setting up the system. The problem is that this is tied to the normal cpu controller. Users who don't have any intention of mucking with RT scheduling end up being dragged into it. Given the strict nature of RR slicing, I'm don't even think it's actually useful to make the slicing hierarchical. From cgroup's POV, it'd be best if RR slicing can be detached. Like in the other mail; hierarchy still makes perfect sense for the container case. The whole RR/FIFO thing is so enormously broken (by definition; this truly is unfixable) that you simply _cannot_ automate it. Yeah, exactly. I don't think you're quite agreeing to the same reasons I am. My main objection to the whole SCHED_RR/FIFO thing as defined by POSIX is that it does not in fact allow the OS to do what an OS _should_ do, namely resource arbitration and control. The whole rt-cgroup controller tries to somewhat contain that, but fundamentally once you use RR/FIFO you've given up your system to userspace control -- which btw is why its usually limited to root. SCHED_DEADLINE avoids all these problems, at the cost of a more complex setup. But the fact that both need fixed portions of a limited total does not in fact mean they're broken. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote: I just realized we allow removing/adding controllers from/to cgroups while there are tasks in them, which isn't safe unless we eliminate all can_attach callbacks. We've done so for some cgroup subsystems, but there are still a few of them... You can't remove can_attach(), we must be able to disallow joining a cgroup. If that results in you not being able to change the cgroup setup with tasks in, so be it -- that seems like a sane restriction anyhow. This is really an interface policy issue. For all other controllers, it's almost trivial to let organizational operations (setting up hierarchies, moving processes around) overrule controller configurations. The main benefit of doing this is that this decouples organizational operations from resource control. Users can depend on the fact that allowed organizational operations won't fail due to specific controller configuration issues. This also works well with controllers accepting target configurations regardless of the current state and enforcing rules to converge to the configured state instead. e.g. if you set max memory lower than the currently used, the config will be accepted and the controller will keep trying to make the current state converge to the target state. This is important as rejecting configuration can lead to chasing game between configuration attempts and run-away resource consumption. Now, RR slices are the special case here because it's inherently different from every other resource cgroup is concerned with. It simply doesn't fit into the same model that other resources follow. There are several options we can try. 1. Decouple RR slices from cpu controller. This would be the best route to follow. RR slices need a hard allocator no matter what we do. There isn't much point in imposing hierarchical structure on top of it. 2. Implement special case behavior so that it can follow the same model. e.g. resetting RR scheduling config when the effective cpu cgroup changes or carrying the amount of slice being consumed with the process being moved. No matter how this is done, it's gonna be a clear compromise as we're forcing this into the model which doesn't quite fit it. That said, given how RR slices are a special case to begin with, I think this can be acceptable. 3. Take compromise in the other direction - add exceptions to organizational operations but clearly limit the failure modes. We prolly want to structure code in a way to enforce this. 4. If #1 can be done in time but not right now, simply disallow any RR/FIFO in !root cgroups on the unified hierarchy for now. What do you think? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 12:13:35PM -0400, Tejun Heo wrote: Hello, Peter. On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote: ... But but but... that doesn't make any damn sense! Why would you want to do something mad like that? To me the organization is very much part of the control structure. It cannot be an invariant. Treating it like that destroys the whole notion of a hierarchy. You and I don't really agree on this. The disagreement is fine but what I don't get is why this is such a big deal. How would it break the whole notion of a hierarchy? A user isn't allowed to esacpe the subhierarchy it's allowed in no matter what. Whether organizational operations supercedes configurations or not doesn't matter as long as the user is confined under the right hierarchy. I really don't get what you're saying there. If its not allowed to 'escape' there must be some equivalent of can_attach(). Otherwise you simply cannot reject the move. Furthermore, in majority of use cases, organizational operations are used to set up the hierarchy when starting up a group and then left alone. For stateful controller like memcg process migrations are inherently expensive and intrusive, so the usage model isn't arbitrary. This is a corner case issue and doesn't really affect the whole model. Again, I don't follow, so why is can_attach() bad? I don't think so, any controller which wants to carve up a fixed resource in non proportional ways is going to run into this. Its just that you don't want this, but that doesn't render it less useful. Well, of the resources that we handle right now, it is a special case and a sucky one at that because it ties itself to regular cpu controller which doesn't need that behavior. It doesn't 'tie' itself to the cpu controller, its a fundamental part of the cpu controller. The cpu controller is about all computation time, RR/FIFO is a very much part of that. And RR/FIFO is extra special in that if you grant a process that it can suck your machine dry of this time. This is why you must configure it. People should not unknowingly let programs use RR/FIFO. Also what sorts of 'problems' are people having because of this? What kind of applications 'require' RR/FIFO on a normal desktop? As to not having a hierarchy; you're the one destroying it by saying the organization should be decoupled from the controller. I don't get this part. How does making organization supercede configuration destroy hierarchy? If you want to unconditionally allow task migration between groups, the hierarchy doesn't actually mean anything. You can't enforce hierarchical constraints. Which to me is the entire point of having a hierarchy. And, no a hierarchy still makes perfect sense, think of containers, they might not even see the parent. The mode of configuration is different tho. No matter what we do, if we want to automate this sort of distribution with resource as limited as realtime slices, it'll need a separate allocator which can carve out resources on demand. But you don't want to automate, full stop. This can't be ratio-distributed or soft-capped and having to tie this together with regular cpu controller is annoying. Welcome to actual world issues. Stop pretending this stuff is easy and can be hidden from the user. IF people want to use RR/FIFO they had better damn well know what they're doing. There is not way around that. There's just too many things that can go wrong with it. If they don't want to deal with this problems, then tell them to go away. Do _NOT_ pretend its easy and fudge it for them. This on-demand carving thing you mention, that's a _MASSIVE_ fudge. Just don't even go there. I really think you're moving in the wrong direction with the whole cgroup stuff if you just want to willy nilly allow everything. Well, let's agree to disagree on that one. It's not about allowing willy nilly everything but separating out the specification of intent from the current state and you also saw how coupling the two tightly messed up cpuset. It can make configuration tedious enough to the point where it becomes impractical to use under certain circumstances. Well, no I didn't see how cpusets was messed up. You see that is where we start to disagree. The improvement I wanted to cpusets was to simply disallow hotplug when there were tasks that could not go elsewhere. The thing is, allowing to specify configurations doesn't prevent the user from enforcing stricter rules. The current state is always visible to the user and if it fails to converge, the user can take whatever actions that it needs to take to remedy the situation. Right, so how about failing hotplug if there's (user) tasks pinned to a cpu? That's clearly visible and the user can go fix it if he really wants to do the unplug. That's a very similar thing, but you've argued against it. That said, this is not the point we're now arguing
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 10:41:04AM -0400, Tejun Heo wrote: Hello, Peter. On Mon, May 04, 2015 at 02:37:38PM +0200, Peter Zijlstra wrote: I just realized we allow removing/adding controllers from/to cgroups while there are tasks in them, which isn't safe unless we eliminate all can_attach callbacks. We've done so for some cgroup subsystems, but there are still a few of them... You can't remove can_attach(), we must be able to disallow joining a cgroup. If that results in you not being able to change the cgroup setup with tasks in, so be it -- that seems like a sane restriction anyhow. This is really an interface policy issue. For all other controllers, it's almost trivial to let organizational operations (setting up hierarchies, moving processes around) overrule controller configurations. The main benefit of doing this is that this decouples organizational operations from resource control. Users can depend on the fact that allowed organizational operations won't fail due to specific controller configuration issues. But but but... that doesn't make any damn sense! Why would you want to do something mad like that? To me the organization is very much part of the control structure. It cannot be an invariant. Treating it like that destroys the whole notion of a hierarchy. This also works well with controllers accepting target configurations regardless of the current state and enforcing rules to converge to the configured state instead. I think we had a long discussion on that which we never finished. I'm not much for converging to a state. Either it can or it can not and you hard fail. With this soft lets just accept any old crap mentality you cannot provide guarantees. e.g. if you set max memory lower than the currently used, the config will be accepted and the controller will keep trying to make the current state converge to the target state. This is important as rejecting configuration can lead to chasing game between configuration attempts and run-away resource consumption. This is an entirely different issue; albeit with its own pitfalls, what if you put the max too low and you run into a never ending reclaim loop? Attempting to attain the unattainable. Now, RR slices are the special case here because it's inherently different from every other resource cgroup is concerned with. I don't think so, any controller which wants to carve up a fixed resource in non proportional ways is going to run into this. Its just that you don't want this, but that doesn't render it less useful. It simply doesn't fit into the same model that other resources follow. There are several options we can try. 1. Decouple RR slices from cpu controller. This would be the best route to follow. RR slices need a hard allocator no matter what we do. There isn't much point in imposing hierarchical structure on top of it. The same is true of SCHED_DEADLINE, we hard divide a fixed amount. We've not currently exposed it to cgroups, but we want to eventually. As to not having a hierarchy; you're the one destroying it by saying the organization should be decoupled from the controller. And, no a hierarchy still makes perfect sense, think of containers, they might not even see the parent. 3. Take compromise in the other direction - add exceptions to organizational operations but clearly limit the failure modes. We prolly want to structure code in a way to enforce this. I'm for failure modes as you should well now by know ;-) I really think you're moving in the wrong direction with the whole cgroup stuff if you just want to willy nilly allow everything. Also, who's the one doing a PID controller which will hard fail fork? How are you going to do away with can_attach() there? Surely you need to dis-allow another task joining when its at its maximum number of allowed PIDs, the same condition you're going to fail fork(). So no; hard failure is good and desired. It allows guarantees, which is a good and desired feature of control. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote: ... But but but... that doesn't make any damn sense! Why would you want to do something mad like that? To me the organization is very much part of the control structure. It cannot be an invariant. Treating it like that destroys the whole notion of a hierarchy. You and I don't really agree on this. The disagreement is fine but what I don't get is why this is such a big deal. How would it break the whole notion of a hierarchy? A user isn't allowed to esacpe the subhierarchy it's allowed in no matter what. Whether organizational operations supercedes configurations or not doesn't matter as long as the user is confined under the right hierarchy. Furthermore, in majority of use cases, organizational operations are used to set up the hierarchy when starting up a group and then left alone. For stateful controller like memcg process migrations are inherently expensive and intrusive, so the usage model isn't arbitrary. This is a corner case issue and doesn't really affect the whole model. e.g. if you set max memory lower than the currently used, the config will be accepted and the controller will keep trying to make the current state converge to the target state. This is important as rejecting configuration can lead to chasing game between configuration attempts and run-away resource consumption. This is an entirely different issue; albeit with its own pitfalls, what if you put the max too low and you run into a never ending reclaim loop? Attempting to attain the unattainable. That's an oom condition and memcg handles it accordingly. Now, RR slices are the special case here because it's inherently different from every other resource cgroup is concerned with. I don't think so, any controller which wants to carve up a fixed resource in non proportional ways is going to run into this. Its just that you don't want this, but that doesn't render it less useful. Well, of the resources that we handle right now, it is a special case and a sucky one at that because it ties itself to regular cpu controller which doesn't need that behavior. It simply doesn't fit into the same model that other resources follow. There are several options we can try. 1. Decouple RR slices from cpu controller. This would be the best route to follow. RR slices need a hard allocator no matter what we do. There isn't much point in imposing hierarchical structure on top of it. The same is true of SCHED_DEADLINE, we hard divide a fixed amount. We've not currently exposed it to cgroups, but we want to eventually. As to not having a hierarchy; you're the one destroying it by saying the organization should be decoupled from the controller. I don't get this part. How does making organization supercede configuration destroy hierarchy? And, no a hierarchy still makes perfect sense, think of containers, they might not even see the parent. The mode of configuration is different tho. No matter what we do, if we want to automate this sort of distribution with resource as limited as realtime slices, it'll need a separate allocator which can carve out resources on demand. This can't be ratio-distributed or soft-capped and having to tie this together with regular cpu controller is annoying. 3. Take compromise in the other direction - add exceptions to organizational operations but clearly limit the failure modes. We prolly want to structure code in a way to enforce this. I'm for failure modes as you should well now by know ;-) I really think you're moving in the wrong direction with the whole cgroup stuff if you just want to willy nilly allow everything. Well, let's agree to disagree on that one. It's not about allowing willy nilly everything but separating out the specification of intent from the current state and you also saw how coupling the two tightly messed up cpuset. It can make configuration tedious enough to the point where it becomes impractical to use under certain circumstances. The thing is, allowing to specify configurations doesn't prevent the user from enforcing stricter rules. The current state is always visible to the user and if it fails to converge, the user can take whatever actions that it needs to take to remedy the situation. Also, who's the one doing a PID controller which will hard fail fork? How are you going to do away with can_attach() there? Surely you need to dis-allow another task joining when its at its maximum number of allowed PIDs, the same condition you're going to fail fork(). It allows migrations into already capped cgroup. It just won't allow new forks. This isn't different from allowing limit to be lowered below the current and we *do* want that because otherwise it becomes a race between whoever is setting the config and whoever is consuming the resources. You always wanna be able to say stop giving out resources now.
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, 5 May 2015, Peter Zijlstra wrote: On Tue, May 05, 2015 at 12:13:35PM -0400, Tejun Heo wrote: On Tue, May 05, 2015 at 05:11:13PM +0200, Peter Zijlstra wrote: So no; hard failure is good and desired. It allows guarantees, which is a good and desired feature of control. Isn't that too sweeping a statement? We want them in some places but not necessarily in all places. The hard failures aren't going away. They're just localized to specific areas where they're easier to handle. Easier how? I'm really not seeing how any of this is making things easier for anybody. All I'm seeing is that you're making cgroups useless for people who want to guarantee things (eg. the realtime people). I fully agree and after reading through this thread I really have to say that this whole notion of relax the admission control and then try to magically converge to the resource limits is horrible in all aspects. Hierarchies must have a strictly inherited and overall consistent resource management and therefor resource limitation. Otherwise they are just useless. The idea of allowing overcommitment and magically converging to back to the limits yells heuristics all over the place and we all know how reliable heuristics are. Tejun, you try to make the whole configuration and placement simpler for the user, but all you achieve is that you act like all these politicians who promise tax cuts and whatever and forget about them once the elections are over. How is that going to make stuff simpler for users/admins? Not at all. Instead of failing hard at placement/configuration time they get surprised by hard to understand fallout of magic convergence heuristics. That's crap and no matter how you argue it stays crap. As Peter said several times: hard failure is good and desired. It's a very clear information on which people can act on. If the failures modes are nilly-willy today, as you wrote somewhere, then we need to fix that and make them consistent and understandable and not replace them by half baken heuristics which postpone the failure to some point where it is even less understandable. If there are issues with run-away problems, i.e. upping a resource limit which gets eaten up from the existing tasks before you can admit a new one, then your magic convergence thing is again the wrong answer. The right approach is: 1) Up the limit and make a reservation at the same time 2) Admit the new task and allow it to consume the reservation 3) Set it effective You can apply this to ALL sorts of resource controllers and you give the user a very simple to understand mechanism to control and configure his system. Are you really going to force us to abandon cgroups and invent yet another grouping thing? Sigh no. I think cgroups can be fixed, if we just adhere to the basic principles of hierarchical resource management and remove/reject all magic we'll fix that for you nonsense. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Thomas. On Tue, May 05, 2015 at 08:29:28PM +0200, Thomas Gleixner wrote: I fully agree and after reading through this thread I really have to say that this whole notion of relax the admission control and then try to magically converge to the resource limits is horrible in all aspects. This comes down to controllers allowing limits to be configured current usage. We need to allow and define what happens in that situation and moving a process into a full cgroup inherently follows the same pattern albeit from the other direction. The idea of allowing overcommitment and magically converging to back to the limits yells heuristics all over the place and we all know how reliable heuristics are. It's not magic heuristics. This is a core part of normal operation. As Peter said several times: hard failure is good and desired. It's a very clear information on which people can act on. If the failures modes are nilly-willy today, as you wrote somewhere, then we need to fix that and make them consistent and understandable and not replace them by half baken heuristics which postpone the failure to some point where it is even less understandable. There are no such magic heuristics because controllers need well defined behaviors when current is above limit anyway and behave exactly the same way no matter how that state is reached. For resources like RR slices, this doesn't work and that's why this is an issue, so yeah this is the process of finding out what must be able to fail. If there are issues with run-away problems, i.e. upping a resource limit which gets eaten up from the existing tasks before you can admit a new one, then your magic convergence thing is again the wrong answer. The right approach is: 1) Up the limit and make a reservation at the same time 2) Admit the new task and allow it to consume the reservation 3) Set it effective I don't really think this is a scenario we need to worry about. If we choose to fail migration, let's just fail it. There's no point in building a mechanism to work around malbehavior from its users. Are you really going to force us to abandon cgroups and invent yet another grouping thing? Sigh no. I think cgroups can be fixed, if we just adhere to the basic principles of hierarchical resource management and remove/reject all magic we'll fix that for you nonsense. So, let's do -EBUSY for hard resource failures which have to be exact. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote: What I don't want to happen is controllers failing migrations willy-nilly for random reasons leaving users baffled, which we've actually been doing unfortunately. Maybe we need to deal with this fixed resource arbitration as a separate class and allow them to fail migration w/ -EBUSY. Ah, _that_ was the problem. Which is something created by this co-mounting of controllers. You could of course store the ss-id of the failing operation in task_struct and have a file reporting the name of the ss-id. That way, there is a simple way to find out which controller failed the migrate. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, Peter. On Tue, May 05, 2015 at 09:00:57PM +0200, Peter Zijlstra wrote: On Tue, May 05, 2015 at 12:31:12PM -0400, Tejun Heo wrote: What I don't want to happen is controllers failing migrations willy-nilly for random reasons leaving users baffled, which we've actually been doing unfortunately. Maybe we need to deal with this fixed resource arbitration as a separate class and allow them to fail migration w/ -EBUSY. Ah, _that_ was the problem. Which is something created by this co-mounting of controllers. Yeah, partly, but also that it's an extra failure mode which isn't necessary for most controllers. You could of course store the ss-id of the failing operation in task_struct and have a file reporting the name of the ss-id. That way, there is a simple way to find out which controller failed the migrate. Given that the resources which can fail are very limited, I don't think we need that right now as long as we limit and document the possible failure cases clearly. Hopefully, this won't devolve into collection of arbitrary failures. Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Tue, 2015-05-05 at 11:46 +0800, Zefan Li wrote: On 2015/5/4 22:09, Mike Galbraith wrote: On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote: On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Uhm, how does that work? Would a task their effective cgroup be the first parent that has a controller enabled? In particular, in your example, if T were part of child1, would its cpu controller be root? correct. That's what I'd hope for. I wanted to try that cgroup.subtree_control gizmo to see for myself, but I don't have one, and probably won't get one until I introduce systemd to my axe (again, it's a slow learner). I'm testing in an environment without systemd. Lucky you. You need to mount cgroup with a special option: # mount -t cgroup -o __DEVEL__sane_behavior xxx /where If a cgroup controller has already been mounted without this option, you won't see it in the unified hierarchy, so firstly you need to delete all cgroups in it and umount it. Yeah, I found the flag, and systemd is indeed in the way. You already verified what subtree_control does, so I needn't squabble with the vile thing over cgroups possession... immediately anyway. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Hello, again. On Tue, May 05, 2015 at 06:50:06PM +0200, Peter Zijlstra wrote: I really don't get what you're saying there. If its not allowed to 'escape' there must be some equivalent of can_attach(). Otherwise you simply cannot reject the move. A given user isn't allowed to move processes into a cgroup outside its subhierarchy and the hierarchical resource control keeps the subhierarchy under the limits no matter what the user does inside it. Whether can_attach can fail or not is peripheral in this sense - if a user can move processes into a cgroup outside its allowed scope, the user can already escape regardless of the specifics of configuration. Any user of cgroups should be confined to its scope and when it's confined that way, the hierarchical limits are enforced no matter what happens in its subhierarchy. Furthermore, in majority of use cases, organizational operations are used to set up the hierarchy when starting up a group and then left alone. For stateful controller like memcg process migrations are inherently expensive and intrusive, so the usage model isn't arbitrary. This is a corner case issue and doesn't really affect the whole model. Again, I don't follow, so why is can_attach() bad? It's more like can_attach failures don't add much for other controllers. Please see below. People should not unknowingly let programs use RR/FIFO. Also what sorts of 'problems' are people having because of this? What kind of applications 'require' RR/FIFO on a normal desktop? The cases I hear about are mostly audio applications which end up in whatever default cgroups other applications are put in w/o an easy way to configure the hierarchy for RR slices. As I wrote way back, if these can't be decoupled, whoever is setting up cpu cgroup hierarchies will also have to take part in distributing realtime slices. This might not necessarily be a bad thing. It's just different from everything else cgroups deal with at this point. I don't get this part. How does making organization supercede configuration destroy hierarchy? If you want to unconditionally allow task migration between groups, the hierarchy doesn't actually mean anything. You can't enforce hierarchical constraints. Which to me is the entire point of having a hierarchy. No, hierarchy still puts restrictions on who can do what where. Whether organization operations supercede configurations or not doens't affect this at all. Again, if you can stow away processes out of your domain, you're escaping the hierarchical constrasints all the same. Delegations need to scoped no matter what. This can't be ratio-distributed or soft-capped and having to tie this together with regular cpu controller is annoying. Welcome to actual world issues. Stop pretending this stuff is easy and can be hidden from the user. IF people want to use RR/FIFO they had better damn well know what they're doing. There is not way around that. There's just too many things that can go wrong with it. If they don't want to deal with this problems, then tell them to go away. Do _NOT_ pretend its easy and fudge it for them. This on-demand carving thing you mention, that's a _MASSIVE_ fudge. Just don't even go there. How is on-demand allocation fudging? You can do it manually or you can have policies set up to allocate the specific resource. This is really beside the point tho. What I was trying to say was that this takes a different approach from other non-hard resources. Well, let's agree to disagree on that one. It's not about allowing willy nilly everything but separating out the specification of intent from the current state and you also saw how coupling the two tightly messed up cpuset. It can make configuration tedious enough to the point where it becomes impractical to use under certain circumstances. Well, no I didn't see how cpusets was messed up. You see that is where we start to disagree. Yeah, seems that way. Let's agree to disagree here. The improvement I wanted to cpusets was to simply disallow hotplug when there were tasks that could not go elsewhere. Would that mean we're also gonna disallow hotunplug if some threads are pinned to that cpu? And the kernel would still be changing configurations in an non-reversible way. Again, how does that jive with plain affinities? That said, this is not the point we're now arguing about; I want the hierarchy to actually mean something, and the only way to do that is to allow can_attach(). Without can_attach() one cannot provide hierarchical constraints. I don't think this is the point either. The point is how to deal with hard resources that can't be permissive by default. Also, who's the one doing a PID controller which will hard fail fork? How are you going to do away with can_attach() there? Surely you need to dis-allow another task joining when its at its maximum number of allowed PIDs, the same condition you're going
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On 2015/5/4 20:37, Peter Zijlstra wrote: > On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: > >> Some degree of flexibility is provided so that you may disable some >> controllers >> in a subtree. For example: >> >> root ---> child1 >> (cpuset,memory,cpu)(cpuset,memory) >> \ >>\-> child2 >>(cpu) > > Uhm, how does that work? Would a task their effective cgroup be the > first parent that has a controller enabled? > > In particular, in your example, if T were part of child1, would its cpu > controller be root? > >> I just realized we allow removing/adding controllers from/to cgroups >> while there are tasks in them, which isn't safe unless we eliminate all >> can_attach callbacks. We've done so for some cgroup subsystems, but >> there are still a few of them... > > You can't remove can_attach(), we must be able to disallow joining a > cgroup. > > If that results in you not being able to change the cgroup setup with > tasks in, so be it -- that seems like a sane restriction anyhow. > I wasn't thinking about removing can_attach() before I noticed this issue. But I was wondering if we can change the default value of cpu.rt_runtime_us from 0 to -1? So by default the RT tasks can be attached to a newly-created cgroup without users having to make any configuration, and those tasks are confined by the parent cgroup, which is what we have with cfs bw control. This require some changes to the code, but I guess it's do-able? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On 2015/5/4 22:09, Mike Galbraith wrote: > On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote: >> On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: >> >>> Some degree of flexibility is provided so that you may disable some >>> controllers >>> in a subtree. For example: >>> >>> root ---> child1 >>> (cpuset,memory,cpu)(cpuset,memory) >>> \ >>>\-> child2 >>>(cpu) >> >> Uhm, how does that work? Would a task their effective cgroup be the >> first parent that has a controller enabled? >> >> In particular, in your example, if T were part of child1, would its cpu >> controller be root? correct. > > That's what I'd hope for. I wanted to try that cgroup.subtree_control > gizmo to see for myself, but I don't have one, and probably won't get > one until I introduce systemd to my axe (again, it's a slow learner). > I'm testing in an environment without systemd. You need to mount cgroup with a special option: # mount -t cgroup -o __DEVEL__sane_behavior xxx /where If a cgroup controller has already been mounted without this option, you won't see it in the unified hierarchy, so firstly you need to delete all cgroups in it and umount it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote: > On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: > > > Some degree of flexibility is provided so that you may disable some > > controllers > > in a subtree. For example: > > > > root ---> child1 > > (cpuset,memory,cpu)(cpuset,memory) > > \ > >\-> child2 > >(cpu) > > Uhm, how does that work? Would a task their effective cgroup be the > first parent that has a controller enabled? > > In particular, in your example, if T were part of child1, would its cpu > controller be root? That's what I'd hope for. I wanted to try that cgroup.subtree_control gizmo to see for myself, but I don't have one, and probably won't get one until I introduce systemd to my axe (again, it's a slow learner). -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: > Some degree of flexibility is provided so that you may disable some > controllers > in a subtree. For example: > > root ---> child1 > (cpuset,memory,cpu)(cpuset,memory) > \ >\-> child2 >(cpu) Uhm, how does that work? Would a task their effective cgroup be the first parent that has a controller enabled? In particular, in your example, if T were part of child1, would its cpu controller be root? > I just realized we allow removing/adding controllers from/to cgroups > while there are tasks in them, which isn't safe unless we eliminate all > can_attach callbacks. We've done so for some cgroup subsystems, but > there are still a few of them... You can't remove can_attach(), we must be able to disallow joining a cgroup. If that results in you not being able to change the cgroup setup with tasks in, so be it -- that seems like a sane restriction anyhow. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 17:11 +0800, Zefan Li wrote: > >>> Some degree of flexibility is provided so that you may disable some > >>> controllers > >>> in a subtree. For example: > >>> > >>> root ---> child1 > >>> (cpuset,memory,cpu)(cpuset,memory) > >>> \ > >>>\-> child2 > >>>(cpu) > >> > >> Whew, that's a relief. Thanks. > > > > But somehow I'm not feeling a whole lot better. > > > > "May" means if you don't explicitly take some action to disable group > > scheduling, you get it (I don't care if I have an off button), but that > > would also seemingly mean that we would then have rt tasks in taskgroups > > with no bandwidth allocated, ie you have to make group scheduling for rt > > tasks meaningless until a bandwidth appeared, and to make bandwidth > > appear, you'd have to stop the world, distribute, continue, no? > > > > The current "just say no" seems a lot more sensible. > > > > I just realized we allow removing/adding controllers from/to cgroups > while there are tasks in them, which isn't safe unless we eliminate all > can_attach callbacks. We've done so for some cgroup subsystems, but > there are still a few of them... I was pondering the future (or so I thought), but seems it turned into the past while I wasn't looking. Oh well, you found a bug anyway. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
>>> Some degree of flexibility is provided so that you may disable some >>> controllers >>> in a subtree. For example: >>> >>> root ---> child1 >>> (cpuset,memory,cpu)(cpuset,memory) >>> \ >>>\-> child2 >>>(cpu) >> >> Whew, that's a relief. Thanks. > > But somehow I'm not feeling a whole lot better. > > "May" means if you don't explicitly take some action to disable group > scheduling, you get it (I don't care if I have an off button), but that > would also seemingly mean that we would then have rt tasks in taskgroups > with no bandwidth allocated, ie you have to make group scheduling for rt > tasks meaningless until a bandwidth appeared, and to make bandwidth > appear, you'd have to stop the world, distribute, continue, no? > > The current "just say no" seems a lot more sensible. > I just realized we allow removing/adding controllers from/to cgroups while there are tasks in them, which isn't safe unless we eliminate all can_attach callbacks. We've done so for some cgroup subsystems, but there are still a few of them... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Whew, that's a relief. Thanks. But somehow I'm not feeling a whole lot better. May means if you don't explicitly take some action to disable group scheduling, you get it (I don't care if I have an off button), but that would also seemingly mean that we would then have rt tasks in taskgroups with no bandwidth allocated, ie you have to make group scheduling for rt tasks meaningless until a bandwidth appeared, and to make bandwidth appear, you'd have to stop the world, distribute, continue, no? The current just say no seems a lot more sensible. I just realized we allow removing/adding controllers from/to cgroups while there are tasks in them, which isn't safe unless we eliminate all can_attach callbacks. We've done so for some cgroup subsystems, but there are still a few of them... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 17:11 +0800, Zefan Li wrote: Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Whew, that's a relief. Thanks. But somehow I'm not feeling a whole lot better. May means if you don't explicitly take some action to disable group scheduling, you get it (I don't care if I have an off button), but that would also seemingly mean that we would then have rt tasks in taskgroups with no bandwidth allocated, ie you have to make group scheduling for rt tasks meaningless until a bandwidth appeared, and to make bandwidth appear, you'd have to stop the world, distribute, continue, no? The current just say no seems a lot more sensible. I just realized we allow removing/adding controllers from/to cgroups while there are tasks in them, which isn't safe unless we eliminate all can_attach callbacks. We've done so for some cgroup subsystems, but there are still a few of them... I was pondering the future (or so I thought), but seems it turned into the past while I wasn't looking. Oh well, you found a bug anyway. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Uhm, how does that work? Would a task their effective cgroup be the first parent that has a controller enabled? In particular, in your example, if T were part of child1, would its cpu controller be root? I just realized we allow removing/adding controllers from/to cgroups while there are tasks in them, which isn't safe unless we eliminate all can_attach callbacks. We've done so for some cgroup subsystems, but there are still a few of them... You can't remove can_attach(), we must be able to disallow joining a cgroup. If that results in you not being able to change the cgroup setup with tasks in, so be it -- that seems like a sane restriction anyhow. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote: On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Uhm, how does that work? Would a task their effective cgroup be the first parent that has a controller enabled? In particular, in your example, if T were part of child1, would its cpu controller be root? That's what I'd hope for. I wanted to try that cgroup.subtree_control gizmo to see for myself, but I don't have one, and probably won't get one until I introduce systemd to my axe (again, it's a slow learner). -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On 2015/5/4 22:09, Mike Galbraith wrote: On Mon, 2015-05-04 at 14:37 +0200, Peter Zijlstra wrote: On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Uhm, how does that work? Would a task their effective cgroup be the first parent that has a controller enabled? In particular, in your example, if T were part of child1, would its cpu controller be root? correct. That's what I'd hope for. I wanted to try that cgroup.subtree_control gizmo to see for myself, but I don't have one, and probably won't get one until I introduce systemd to my axe (again, it's a slow learner). I'm testing in an environment without systemd. You need to mount cgroup with a special option: # mount -t cgroup -o __DEVEL__sane_behavior xxx /where If a cgroup controller has already been mounted without this option, you won't see it in the unified hierarchy, so firstly you need to delete all cgroups in it and umount it. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On 2015/5/4 20:37, Peter Zijlstra wrote: On Mon, May 04, 2015 at 05:11:10PM +0800, Zefan Li wrote: Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Uhm, how does that work? Would a task their effective cgroup be the first parent that has a controller enabled? In particular, in your example, if T were part of child1, would its cpu controller be root? I just realized we allow removing/adding controllers from/to cgroups while there are tasks in them, which isn't safe unless we eliminate all can_attach callbacks. We've done so for some cgroup subsystems, but there are still a few of them... You can't remove can_attach(), we must be able to disallow joining a cgroup. If that results in you not being able to change the cgroup setup with tasks in, so be it -- that seems like a sane restriction anyhow. I wasn't thinking about removing can_attach() before I noticed this issue. But I was wondering if we can change the default value of cpu.rt_runtime_us from 0 to -1? So by default the RT tasks can be attached to a newly-created cgroup without users having to make any configuration, and those tasks are confined by the parent cgroup, which is what we have with cfs bw control. This require some changes to the code, but I guess it's do-able? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 07:10 +0200, Mike Galbraith wrote: > On Mon, 2015-05-04 at 12:39 +0800, Zefan Li wrote: > > > >> We are moving toward unified hierarchy where all the cgroup controllers > > >> are bound together, so it would make cgroups easier to use if we have > > >> less > > >> restrictions on attaching tasks between cgroups. > > > > > > Forcing group scheduling overhead on users if they want cpuset or memory > > > cgroup functionality would be far from wonderful. Am I interpreting the > > > implications of this unification/binding properly? > > > > > > (I hope not, surely the plan is not to utterly _destroy_ cgroup utility) > > > > > > > Some degree of flexibility is provided so that you may disable some > > controllers > > in a subtree. For example: > > > > root ---> child1 > > (cpuset,memory,cpu)(cpuset,memory) > > \ > >\-> child2 > >(cpu) > > Whew, that's a relief. Thanks. But somehow I'm not feeling a whole lot better. "May" means if you don't explicitly take some action to disable group scheduling, you get it (I don't care if I have an off button), but that would also seemingly mean that we would then have rt tasks in taskgroups with no bandwidth allocated, ie you have to make group scheduling for rt tasks meaningless until a bandwidth appeared, and to make bandwidth appear, you'd have to stop the world, distribute, continue, no? The current "just say no" seems a lot more sensible. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 12:39 +0800, Zefan Li wrote: > >> We are moving toward unified hierarchy where all the cgroup controllers > >> are bound together, so it would make cgroups easier to use if we have less > >> restrictions on attaching tasks between cgroups. > > > > Forcing group scheduling overhead on users if they want cpuset or memory > > cgroup functionality would be far from wonderful. Am I interpreting the > > implications of this unification/binding properly? > > > > (I hope not, surely the plan is not to utterly _destroy_ cgroup utility) > > > > Some degree of flexibility is provided so that you may disable some > controllers > in a subtree. For example: > > root ---> child1 > (cpuset,memory,cpu)(cpuset,memory) > \ >\-> child2 >(cpu) Whew, that's a relief. Thanks. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On 2015/5/4 11:13, Mike Galbraith wrote: > On Mon, 2015-05-04 at 08:54 +0800, Zefan Li wrote: >> It's allowed to promote a task from normal to realtime after it has been >> attached to a non-root cgroup, but it will fail if the attaching happens >> after it has become realtime. I don't see how this restriction is useful. > > In the CONFIG_RT_GROUP_SCHED case, promotion will fail is there is no > bandwidth allocated. > Right. I forgot to mention this patch affects !CONFIG_RT_GROUP_SCHED only, though it should be obvious by reading the change. >> We are moving toward unified hierarchy where all the cgroup controllers >> are bound together, so it would make cgroups easier to use if we have less >> restrictions on attaching tasks between cgroups. > > Forcing group scheduling overhead on users if they want cpuset or memory > cgroup functionality would be far from wonderful. Am I interpreting the > implications of this unification/binding properly? > > (I hope not, surely the plan is not to utterly _destroy_ cgroup utility) > Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root ---> child1 (cpuset,memory,cpu)(cpuset,memory) \ \-> child2 (cpu) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 08:54 +0800, Zefan Li wrote: > It's allowed to promote a task from normal to realtime after it has been > attached to a non-root cgroup, but it will fail if the attaching happens > after it has become realtime. I don't see how this restriction is useful. In the CONFIG_RT_GROUP_SCHED case, promotion will fail is there is no bandwidth allocated. > We are moving toward unified hierarchy where all the cgroup controllers > are bound together, so it would make cgroups easier to use if we have less > restrictions on attaching tasks between cgroups. Forcing group scheduling overhead on users if they want cpuset or memory cgroup functionality would be far from wonderful. Am I interpreting the implications of this unification/binding properly? (I hope not, surely the plan is not to utterly _destroy_ cgroup utility) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 12:39 +0800, Zefan Li wrote: We are moving toward unified hierarchy where all the cgroup controllers are bound together, so it would make cgroups easier to use if we have less restrictions on attaching tasks between cgroups. Forcing group scheduling overhead on users if they want cpuset or memory cgroup functionality would be far from wonderful. Am I interpreting the implications of this unification/binding properly? (I hope not, surely the plan is not to utterly _destroy_ cgroup utility) Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Whew, that's a relief. Thanks. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 07:10 +0200, Mike Galbraith wrote: On Mon, 2015-05-04 at 12:39 +0800, Zefan Li wrote: We are moving toward unified hierarchy where all the cgroup controllers are bound together, so it would make cgroups easier to use if we have less restrictions on attaching tasks between cgroups. Forcing group scheduling overhead on users if they want cpuset or memory cgroup functionality would be far from wonderful. Am I interpreting the implications of this unification/binding properly? (I hope not, surely the plan is not to utterly _destroy_ cgroup utility) Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) Whew, that's a relief. Thanks. But somehow I'm not feeling a whole lot better. May means if you don't explicitly take some action to disable group scheduling, you get it (I don't care if I have an off button), but that would also seemingly mean that we would then have rt tasks in taskgroups with no bandwidth allocated, ie you have to make group scheduling for rt tasks meaningless until a bandwidth appeared, and to make bandwidth appear, you'd have to stop the world, distribute, continue, no? The current just say no seems a lot more sensible. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On Mon, 2015-05-04 at 08:54 +0800, Zefan Li wrote: It's allowed to promote a task from normal to realtime after it has been attached to a non-root cgroup, but it will fail if the attaching happens after it has become realtime. I don't see how this restriction is useful. In the CONFIG_RT_GROUP_SCHED case, promotion will fail is there is no bandwidth allocated. We are moving toward unified hierarchy where all the cgroup controllers are bound together, so it would make cgroups easier to use if we have less restrictions on attaching tasks between cgroups. Forcing group scheduling overhead on users if they want cpuset or memory cgroup functionality would be far from wonderful. Am I interpreting the implications of this unification/binding properly? (I hope not, surely the plan is not to utterly _destroy_ cgroup utility) -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: Relax a restriction in sched_rt_can_attach()
On 2015/5/4 11:13, Mike Galbraith wrote: On Mon, 2015-05-04 at 08:54 +0800, Zefan Li wrote: It's allowed to promote a task from normal to realtime after it has been attached to a non-root cgroup, but it will fail if the attaching happens after it has become realtime. I don't see how this restriction is useful. In the CONFIG_RT_GROUP_SCHED case, promotion will fail is there is no bandwidth allocated. Right. I forgot to mention this patch affects !CONFIG_RT_GROUP_SCHED only, though it should be obvious by reading the change. We are moving toward unified hierarchy where all the cgroup controllers are bound together, so it would make cgroups easier to use if we have less restrictions on attaching tasks between cgroups. Forcing group scheduling overhead on users if they want cpuset or memory cgroup functionality would be far from wonderful. Am I interpreting the implications of this unification/binding properly? (I hope not, surely the plan is not to utterly _destroy_ cgroup utility) Some degree of flexibility is provided so that you may disable some controllers in a subtree. For example: root --- child1 (cpuset,memory,cpu)(cpuset,memory) \ \- child2 (cpu) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/