Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, On Fri, Jun 02, 2017 at 04:36:22PM -0400, Waiman Long wrote: > I wouldn't argue further on that if you insist. However, I still want to Oh, please don't get me wrong. I'm not trying to shut down the discussion or anything. It's just that whole-scope discussions can get very meandering and time-consuming when these two issues can be decoupled from each other without compromising on either. Let's approach these issues separately. > relax the constraint somewhat by abandoning the no internal process > constraint when only threaded controllers (non-resource domains) are > enabled even when thread mode has not been explicitly enabled. It is a > modified version my second alternative. Now the question is which > controllers are considered to be resource domains. I think memory and > blkio are in the list. What else do you think should be considered > resource domains? And we're now a bit into repeating ourselves but for controlling of any significant resources (mostly cpu, memory, io), there gotta be significant portion of resource consumption which isn't tied to spcific processes or threads that should be accounted for. Both memory and io already do this to a certain extent, but not completely. cpu doesn't do it at all yet but we usually can't / shouldn't declare a resource category to be domain-free. There are exceptions - controllers which are only used for membership identification (perf and the old net controllers), pids which is explicitly tied to tasks (note that CPU cycles aren't), cpuset which is an attribute propagating / restricting controller. Out of those, the identification uses already aren't affected by the constraint as they're now all either direct membership test against the hierarchy or implicit controllers which aren't subject to the constraint. That leaves pids and cpuset. We can exempt them from the constraint but I'm not quite sure what that buys us given that neither is affected by requiring explicit leaf nodes. It'd just make the rules more complicated without actual benefits. That said, we can exempt those two. I don't see much point in it but we can definitely discuss the pros and cons, and it's likely that it's not gonna make much difference wichever way we choose. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 06/01/2017 05:18 PM, Tejun Heo wrote: > Hello, > > On Thu, Jun 01, 2017 at 05:12:42PM -0400, Waiman Long wrote: >> Are you referring to keeping the no internal process restriction and >> document how to work around that instead? I would like to hear what >> workarounds are currently being used. > What we've been talking about all along - just creating explicit leaf > nodes. > >> Anyway, you currently allow internal process in thread mode, but not in >> non-thread mode. I would prefer no such restriction in both thread and >> non-thread mode. > Heh, so, these aren't arbitrary. The contraint is tied to > implementing resource domains and thread subtree doesn't have resource > domains in them, so they don't need the constraint. I'm sorry about > the short replies but I'm kinda really tied up right now. I'm gonna > do the thread mode so that it can be agnostic w.r.t. the internal > process constraint and I think it could be helpful to decouple these > discussions. We've been having this discussion for a couple years now > and it looks like we're gonna go through it all over, which is fine, > but let's at least keep that separate. I wouldn't argue further on that if you insist. However, I still want to relax the constraint somewhat by abandoning the no internal process constraint when only threaded controllers (non-resource domains) are enabled even when thread mode has not been explicitly enabled. It is a modified version my second alternative. Now the question is which controllers are considered to be resource domains. I think memory and blkio are in the list. What else do you think should be considered resource domains? Cheers, Longman any of the resource domains (!threaded) controllers are enabled. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, On Thu, Jun 01, 2017 at 05:12:42PM -0400, Waiman Long wrote: > Are you referring to keeping the no internal process restriction and > document how to work around that instead? I would like to hear what > workarounds are currently being used. What we've been talking about all along - just creating explicit leaf nodes. > Anyway, you currently allow internal process in thread mode, but not in > non-thread mode. I would prefer no such restriction in both thread and > non-thread mode. Heh, so, these aren't arbitrary. The contraint is tied to implementing resource domains and thread subtree doesn't have resource domains in them, so they don't need the constraint. I'm sorry about the short replies but I'm kinda really tied up right now. I'm gonna do the thread mode so that it can be agnostic w.r.t. the internal process constraint and I think it could be helpful to decouple these discussions. We've been having this discussion for a couple years now and it looks like we're gonna go through it all over, which is fine, but let's at least keep that separate. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 06/01/2017 04:52 PM, Tejun Heo wrote: > Hello, > > On Thu, Jun 01, 2017 at 04:48:48PM -0400, Waiman Long wrote: >> I think we are on agreement here. I should we should just document how >> userland can work around the internal process competition issue by >> setting up the cgroup hierarchy properly. Then we can remove the no >> internal process constraint. > Heh, we agree on the immediate solution but not the final direction. > This requirement affects how controllers implement resource control in > significant ways. It is a restriction which can be worked around in > userland relatively easily. I'd much prefer to keep the invariant > intact. > > Thanks. > Are you referring to keeping the no internal process restriction and document how to work around that instead? I would like to hear what workarounds are currently being used. Anyway, you currently allow internal process in thread mode, but not in non-thread mode. I would prefer no such restriction in both thread and non-thread mode. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, On Thu, Jun 01, 2017 at 04:48:48PM -0400, Waiman Long wrote: > I think we are on agreement here. I should we should just document how > userland can work around the internal process competition issue by > setting up the cgroup hierarchy properly. Then we can remove the no > internal process constraint. Heh, we agree on the immediate solution but not the final direction. This requirement affects how controllers implement resource control in significant ways. It is a restriction which can be worked around in userland relatively easily. I'd much prefer to keep the invariant intact. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 06/01/2017 04:38 PM, Tejun Heo wrote: > Hello, > > On Thu, Jun 01, 2017 at 03:27:35PM -0400, Waiman Long wrote: >> As said in an earlier email, I agreed that masking it on the kernel side >> may not be the best solution. I offer 2 other alternatives: >> 1) Document on how to work around the resource domains issue by proper >> setup of the cgroup hierarchy. > We can definitely improve documentation. > >> 2) Mark those controllers that require the no internal process >> competition constraint and disallow internal process only when those >> controllers are active. > We *can* do that but wouldn't this be equivalent to enabling thread > mode implicitly when only thread aware controllers are enabled? > >> I prefer the first alternative, but I can go with the second if necessary. >> >> The major rationale behind my enhanced thread mode patch was to allow >> something like >> >> R -- A -- B >> \ >> T1 -- T2 >> >> where you can have resource domain controllers enabled in the thread >> root as well as some child cgroups of the thread root. As no internal >> process rule is currently not applicable to the thread root, this >> creates the dilemma that we need to deal with internal process competition. >> >> The container invariant that PeterZ talked about will also be a serious >> issue here as I don't think we are going to set up a container root >> cgroup that will have no process allowed in it because it has some child >> cgroups. IMHO, I don't think cgroup v2 will get wide adoption without >> getting rid of that no internal process constraint. > The only thing which is necessary from inside a container is putting > the management processes into their own cgroups so that they can be > controlled (ie. the same thing you did with your patch but doing that > explicitly from userland) and userland management sw can do the same > thing whether it's inside a container or on a bare system. BTW, > systemd already does so and works completely fine in terms of > containerization on cgroup2. It is arguable whether we should make > this more convenient from kernel side but using cgroup2 for resource > control already requires the userspace tools to be adapted to it, so > I'm not sure how much benefit we'd gain from adding that compared to > explicitly documenting it. I think we are on agreement here. I should we should just document how userland can work around the internal process competition issue by setting up the cgroup hierarchy properly. Then we can remove the no internal process constraint. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, On Thu, Jun 01, 2017 at 03:27:35PM -0400, Waiman Long wrote: > As said in an earlier email, I agreed that masking it on the kernel side > may not be the best solution. I offer 2 other alternatives: > 1) Document on how to work around the resource domains issue by proper > setup of the cgroup hierarchy. We can definitely improve documentation. > 2) Mark those controllers that require the no internal process > competition constraint and disallow internal process only when those > controllers are active. We *can* do that but wouldn't this be equivalent to enabling thread mode implicitly when only thread aware controllers are enabled? > I prefer the first alternative, but I can go with the second if necessary. > > The major rationale behind my enhanced thread mode patch was to allow > something like > > R -- A -- B > \ > T1 -- T2 > > where you can have resource domain controllers enabled in the thread > root as well as some child cgroups of the thread root. As no internal > process rule is currently not applicable to the thread root, this > creates the dilemma that we need to deal with internal process competition. > > The container invariant that PeterZ talked about will also be a serious > issue here as I don't think we are going to set up a container root > cgroup that will have no process allowed in it because it has some child > cgroups. IMHO, I don't think cgroup v2 will get wide adoption without > getting rid of that no internal process constraint. The only thing which is necessary from inside a container is putting the management processes into their own cgroups so that they can be controlled (ie. the same thing you did with your patch but doing that explicitly from userland) and userland management sw can do the same thing whether it's inside a container or on a bare system. BTW, systemd already does so and works completely fine in terms of containerization on cgroup2. It is arguable whether we should make this more convenient from kernel side but using cgroup2 for resource control already requires the userspace tools to be adapted to it, so I'm not sure how much benefit we'd gain from adding that compared to explicitly documenting it. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 06/01/2017 11:10 AM, Peter Zijlstra wrote: > On Thu, Jun 01, 2017 at 10:50:42AM -0400, Tejun Heo wrote: >> Hello, Waiman. >> >> A short update. I tried making root special while keeping the >> existing threaded semantics but I didn't really like it because we >> have to couple controller enables/disables with threaded >> enables/disables. I'm now trying a simpler, albeit a bit more >> tedious, approach which should leave things mostly symmetrical. I'm >> hoping to be able to post mostly working patches this week. > I've not had time to look at any of this. But the question I'm most > curious about is how cgroup-v2 preserves the container invariant. If you don't have much time to look at the patch, I will suggest just looking at the cover letter as well as changes to the cgroup-v2.txt file. You will get a pretty good overview of what this patchset is about. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 06/01/2017 02:44 PM, Waiman Long wrote: > On 06/01/2017 11:10 AM, Peter Zijlstra wrote: >> On Thu, Jun 01, 2017 at 10:50:42AM -0400, Tejun Heo wrote: >>> Hello, Waiman. >>> >>> A short update. I tried making root special while keeping the >>> existing threaded semantics but I didn't really like it because we >>> have to couple controller enables/disables with threaded >>> enables/disables. I'm now trying a simpler, albeit a bit more >>> tedious, approach which should leave things mostly symmetrical. I'm >>> hoping to be able to post mostly working patches this week. >> I've not had time to look at any of this. But the question I'm most >> curious about is how cgroup-v2 preserves the container invariant. >> >> That is, each container (namespace) should look like a 'real' machine. >> So just like userns allows to have a uid-0 (aka root) for each container >> and pidns allows a pid-1 for each container, cgroupns should provide a >> root group for each container. >> >> And cgroup-v2 has this 'exception' (aka wart) for the root group which >> needs to be replicated for each namespace. > One of the changes that I proposed in my patches was to get rid of the > no internal process constraint. I think that will solve a big part of > the container invariant problem that we have with cgroup v2. > > Cheers, > Longman Another idea that I have to further solve this container invariant problem is do a cgroup setup like CP -- CR CP - container parent belong to the host CR - container root We can enable the pass-through mode at the subtree_control file of CP to force all CR controllers in pass-through mode. In this case, those controllers are not enabled in the CR like the root. However, the container can enable those in the child cgroups just like the root controller. By enabling those controller in the CP level, the host can control how much resource is being allowed in the container without the container being aware that its resources are being controlled as all the control knobs will show up in the CP, but not in CR. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 06/01/2017 02:47 PM, Tejun Heo wrote: > Hello, Waiman. > > On Thu, Jun 01, 2017 at 02:44:48PM -0400, Waiman Long wrote: >>> And cgroup-v2 has this 'exception' (aka wart) for the root group which >>> needs to be replicated for each namespace. >> One of the changes that I proposed in my patches was to get rid of the >> no internal process constraint. I think that will solve a big part of >> the container invariant problem that we have with cgroup v2. > I'm not sure. It just masks it without actually solving it. I mean, > the constraint is thereq for a reason. "Solving" it would defeat one > of the main capabilities for resource domains and masking it from > kernel side doesn't make whole lot of sense to me given that it's > something which can be easily done from userland. If we take out that > part, for controllers which don't care about resource domains, > wouldn't thread mode be a sufficient solution? As said in an earlier email, I agreed that masking it on the kernel side may not be the best solution. I offer 2 other alternatives: 1) Document on how to work around the resource domains issue by proper setup of the cgroup hierarchy. 2) Mark those controllers that require the no internal process competition constraint and disallow internal process only when those controllers are active. I prefer the first alternative, but I can go with the second if necessary. The major rationale behind my enhanced thread mode patch was to allow something like R -- A -- B \ T1 -- T2 where you can have resource domain controllers enabled in the thread root as well as some child cgroups of the thread root. As no internal process rule is currently not applicable to the thread root, this creates the dilemma that we need to deal with internal process competition. The container invariant that PeterZ talked about will also be a serious issue here as I don't think we are going to set up a container root cgroup that will have no process allowed in it because it has some child cgroups. IMHO, I don't think cgroup v2 will get wide adoption without getting rid of that no internal process constraint. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, Waiman. On Thu, Jun 01, 2017 at 02:44:48PM -0400, Waiman Long wrote: > > And cgroup-v2 has this 'exception' (aka wart) for the root group which > > needs to be replicated for each namespace. > > One of the changes that I proposed in my patches was to get rid of the > no internal process constraint. I think that will solve a big part of > the container invariant problem that we have with cgroup v2. I'm not sure. It just masks it without actually solving it. I mean, the constraint is thereq for a reason. "Solving" it would defeat one of the main capabilities for resource domains and masking it from kernel side doesn't make whole lot of sense to me given that it's something which can be easily done from userland. If we take out that part, for controllers which don't care about resource domains, wouldn't thread mode be a sufficient solution? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 06/01/2017 11:10 AM, Peter Zijlstra wrote: > On Thu, Jun 01, 2017 at 10:50:42AM -0400, Tejun Heo wrote: >> Hello, Waiman. >> >> A short update. I tried making root special while keeping the >> existing threaded semantics but I didn't really like it because we >> have to couple controller enables/disables with threaded >> enables/disables. I'm now trying a simpler, albeit a bit more >> tedious, approach which should leave things mostly symmetrical. I'm >> hoping to be able to post mostly working patches this week. > I've not had time to look at any of this. But the question I'm most > curious about is how cgroup-v2 preserves the container invariant. > > That is, each container (namespace) should look like a 'real' machine. > So just like userns allows to have a uid-0 (aka root) for each container > and pidns allows a pid-1 for each container, cgroupns should provide a > root group for each container. > > And cgroup-v2 has this 'exception' (aka wart) for the root group which > needs to be replicated for each namespace. One of the changes that I proposed in my patches was to get rid of the no internal process constraint. I think that will solve a big part of the container invariant problem that we have with cgroup v2. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 06/01/2017 10:50 AM, Tejun Heo wrote: > Hello, Waiman. > > A short update. I tried making root special while keeping the > existing threaded semantics but I didn't really like it because we > have to couple controller enables/disables with threaded > enables/disables. I'm now trying a simpler, albeit a bit more > tedious, approach which should leave things mostly symmetrical. I'm > hoping to be able to post mostly working patches this week. I am looking forward to your patches. > Also, do you mind posting the debug patches as a separate series? > Let's get the bits which make sense indepdently in the tree. I am going to do that. The debug patches, however, will have dependency on other cgroup patches and so will need to be posted after the core patches. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, Peter. On Thu, Jun 01, 2017 at 05:10:45PM +0200, Peter Zijlstra wrote: > I've not had time to look at any of this. But the question I'm most > curious about is how cgroup-v2 preserves the container invariant. > > That is, each container (namespace) should look like a 'real' machine. > So just like userns allows to have a uid-0 (aka root) for each container > and pidns allows a pid-1 for each container, cgroupns should provide a > root group for each container. > > And cgroup-v2 has this 'exception' (aka wart) for the root group which > needs to be replicated for each namespace. The goal has never been that a container must be indistinguishible from a real machine. For certain things, things simply don't have exact equivalents due to sharing (memory stats or journal writes for example) and those things are exactly why people prefer containers over VMs for certain use cases. If one wants full replication, VM would be the way to go. The goal is allowing enough container invariant so that appropriate workloads can be contained and co-exist in useful ways. This also means that the contained workload is usually either a bit illiterate w.r.t. to the system details (doesn't care) or makes some adjustments for running inside a container (most quasi-full-system ones already do). System root is inherently different from all other nested roots. Making some exceptions for the root isn't about taking away from other roots but more reflecting the inherent differences - there are things which are inherently system / bare-metal. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On Thu, Jun 01, 2017 at 10:50:42AM -0400, Tejun Heo wrote: > Hello, Waiman. > > A short update. I tried making root special while keeping the > existing threaded semantics but I didn't really like it because we > have to couple controller enables/disables with threaded > enables/disables. I'm now trying a simpler, albeit a bit more > tedious, approach which should leave things mostly symmetrical. I'm > hoping to be able to post mostly working patches this week. I've not had time to look at any of this. But the question I'm most curious about is how cgroup-v2 preserves the container invariant. That is, each container (namespace) should look like a 'real' machine. So just like userns allows to have a uid-0 (aka root) for each container and pidns allows a pid-1 for each container, cgroupns should provide a root group for each container. And cgroup-v2 has this 'exception' (aka wart) for the root group which needs to be replicated for each namespace. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, Waiman. A short update. I tried making root special while keeping the existing threaded semantics but I didn't really like it because we have to couple controller enables/disables with threaded enables/disables. I'm now trying a simpler, albeit a bit more tedious, approach which should leave things mostly symmetrical. I'm hoping to be able to post mostly working patches this week. Also, do you mind posting the debug patches as a separate series? Let's get the bits which make sense indepdently in the tree. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, On Wed, May 24, 2017 at 05:17:13PM -0400, Waiman Long wrote: > An alternative is to have separate enabling for thread root. For example, > > # echo root > cgroup.threads > # echo enable > child/cgroup.threads > > The first statement make the current cgroup the thread root. However, > setting it to a thread root doesn't make its child to be threaded. This > have to be explicitly done on each of the children. Once a child cgroup > is made to be threaded, all its descendants will be threaded. That will > have the same effect as the current patch. Yeah, I'm toying with different ideas. I'll get back to you once things get more concrete. > With delegation, do you mean the relationship between a container and > its host? It can be but doesn't have to be. For example, it can be delegations to users without namespace / container being involved. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 05/24/2017 04:36 PM, Tejun Heo wrote: > Hello, Waiman. > > On Mon, May 22, 2017 at 01:13:16PM -0400, Waiman Long wrote: >>> Maybe I'm misunderstanding the design, but this seems to push the >>> processes which belong to the threaded subtree to the parent which is >>> part of the usual resource domain hierarchy thus breaking the no >>> internal competition constraint. I'm not sure this is something we'd >>> want. Given that the limitation of the original threaded mode was the >>> required nesting below root and that we treat root special anyway >>> (exactly in the way necessary), I wonder whether it'd be better to >>> simply allow root to be both domain and thread root. >> Yes, root can be both domain and thread root. I haven't placed any >> restriction on that. > I've been playing with the proposed "make the parent resource domain". > Unfortunately, the parent - child relationship becomes weird. > > The parent becomes the thread root, which means that its > cgroup.threads file becomes writable and threads can be put in there. > It's really weird to write to a child's interface and have the > parent's behavior changed. This becomes weirder with delegation. If > a cgroup is delegated, its cgroup.threads should be delegated too but > if the child enables threaded mode, that makes the undelegated parent > thread root, which means that either 1. the delegatee can't migrate > threads to the thread root or 2. if the parent's cgroup.threads is > writeable, the delegatee can mass with other descendants under it > which shouldn't be allowed. > > I think the operation of making a cgroup a thread root should happen > on the cgroup where that's requested; otherwise, nesting becomes too > twisted. This should be solvable. Will think more about it. > > Thanks. > An alternative is to have separate enabling for thread root. For example, # echo root > cgroup.threads # echo enable > child/cgroup.threads The first statement make the current cgroup the thread root. However, setting it to a thread root doesn't make its child to be threaded. This have to be explicitly done on each of the children. Once a child cgroup is made to be threaded, all its descendants will be threaded. That will have the same effect as the current patch. With delegation, do you mean the relationship between a container and its host? Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, Waiman. On Mon, May 22, 2017 at 01:13:16PM -0400, Waiman Long wrote: > > Maybe I'm misunderstanding the design, but this seems to push the > > processes which belong to the threaded subtree to the parent which is > > part of the usual resource domain hierarchy thus breaking the no > > internal competition constraint. I'm not sure this is something we'd > > want. Given that the limitation of the original threaded mode was the > > required nesting below root and that we treat root special anyway > > (exactly in the way necessary), I wonder whether it'd be better to > > simply allow root to be both domain and thread root. > > Yes, root can be both domain and thread root. I haven't placed any > restriction on that. I've been playing with the proposed "make the parent resource domain". Unfortunately, the parent - child relationship becomes weird. The parent becomes the thread root, which means that its cgroup.threads file becomes writable and threads can be put in there. It's really weird to write to a child's interface and have the parent's behavior changed. This becomes weirder with delegation. If a cgroup is delegated, its cgroup.threads should be delegated too but if the child enables threaded mode, that makes the undelegated parent thread root, which means that either 1. the delegatee can't migrate threads to the thread root or 2. if the parent's cgroup.threads is writeable, the delegatee can mass with other descendants under it which shouldn't be allowed. I think the operation of making a cgroup a thread root should happen on the cgroup where that's requested; otherwise, nesting becomes too twisted. This should be solvable. Will think more about it. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 05/22/2017 01:13 PM, Waiman Long wrote: > On 05/19/2017 04:26 PM, Tejun Heo wrote: >>> @@ -2982,22 +3010,48 @@ static int cgroup_enable_threaded(struct cgroup >>> *cgrp) >>> LIST_HEAD(csets); >>> struct cgrp_cset_link *link; >>> struct css_set *cset, *cset_next; >>> + struct cgroup *child; >>> int ret; >>> + u16 ss_mask; >>> >>> lockdep_assert_held(&cgroup_mutex); >>> >>> /* noop if already threaded */ >>> - if (cgrp->proc_cgrp) >>> + if (cgroup_is_threaded(cgrp)) >>> return 0; >>> >>> - /* allow only if there are neither children or enabled controllers */ >>> - if (css_has_online_children(&cgrp->self) || cgrp->subtree_control) >>> + /* >>> +* Allow only if it is not the root and there are: >>> +* 1) no children, >>> +* 2) no non-threaded controllers are enabled, and >>> +* 3) no attached tasks. >>> +* >>> +* With no attached tasks, it is assumed that no css_sets will be >>> +* linked to the current cgroup. This may not be true if some dead >>> +* css_sets linger around due to task_struct leakage, for example. >>> +*/ >> It doesn't look like the code is actually making this (incorrect) >> assumption. I suppose the comment is from before >> cgroup_is_populated() was added? > Yes, it is a bug. I should have checked the tasks_count instead of using > cgroup_is_populated. Thanks for catching that. Sorry, I would like to take it back. I think cgroup_is_populated() will be set if there is any task attached to the cgroup. So I think it is doing the right thing with regard to (3). Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 05/19/2017 04:26 PM, Tejun Heo wrote: > Hello, Waiman. > > On Mon, May 15, 2017 at 09:34:10AM -0400, Waiman Long wrote: >> Now we could have something like >> >> R -- A -- B >> \ >>T1 -- T2 >> >> where R is the thread root, A and B are non-threaded cgroups, T1 and >> T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree >> where all the non-threaded resources are accounted for in R. The no >> internal process constraint does not apply in the threaded subtree. >> Non-threaded controllers need to properly handle the competition >> between internal processes and child cgroups at the thread root. >> >> This model will be flexible enough to support the need of the threaded >> controllers. > Maybe I'm misunderstanding the design, but this seems to push the > processes which belong to the threaded subtree to the parent which is > part of the usual resource domain hierarchy thus breaking the no > internal competition constraint. I'm not sure this is something we'd > want. Given that the limitation of the original threaded mode was the > required nesting below root and that we treat root special anyway > (exactly in the way necessary), I wonder whether it'd be better to > simply allow root to be both domain and thread root. Yes, root can be both domain and thread root. I haven't placed any restriction on that. > > Specific review points below but we'd probably want to discuss the > overall design first. > >> +static inline bool cgroup_is_threaded(const struct cgroup *cgrp) >> +{ >> +return cgrp->proc_cgrp && (cgrp->proc_cgrp != cgrp); >> +} >> + >> +static inline bool cgroup_is_thread_root(const struct cgroup *cgrp) >> +{ >> +return cgrp->proc_cgrp == cgrp; >> +} > Maybe add a bit of comments explaining what's going on with > ->proc_cgrp? Sure, will do that. >> /** >> + * threaded_children_count - returns # of threaded children >> + * @cgrp: cgroup to be tested >> + * >> + * cgroup_mutex must be held by the caller. >> + */ >> +static int threaded_children_count(struct cgroup *cgrp) >> +{ >> +struct cgroup *child; >> +int count = 0; >> + >> +lockdep_assert_held(&cgroup_mutex); >> +cgroup_for_each_live_child(child, cgrp) >> +if (cgroup_is_threaded(child)) >> +count++; >> +return count; >> +} > It probably would be a good idea to keep track of the count so that we > don't have to count them each time. There are cases where people end > up creating a very high number of cgroups and we've already been > bitten a couple times with silly complexity issues. Thanks for the suggestion, I can keep a count in the cgroup strcture to avoid doing that repetitively. > >> @@ -2982,22 +3010,48 @@ static int cgroup_enable_threaded(struct cgroup >> *cgrp) >> LIST_HEAD(csets); >> struct cgrp_cset_link *link; >> struct css_set *cset, *cset_next; >> +struct cgroup *child; >> int ret; >> +u16 ss_mask; >> >> lockdep_assert_held(&cgroup_mutex); >> >> /* noop if already threaded */ >> -if (cgrp->proc_cgrp) >> +if (cgroup_is_threaded(cgrp)) >> return 0; >> >> -/* allow only if there are neither children or enabled controllers */ >> -if (css_has_online_children(&cgrp->self) || cgrp->subtree_control) >> +/* >> + * Allow only if it is not the root and there are: >> + * 1) no children, >> + * 2) no non-threaded controllers are enabled, and >> + * 3) no attached tasks. >> + * >> + * With no attached tasks, it is assumed that no css_sets will be >> + * linked to the current cgroup. This may not be true if some dead >> + * css_sets linger around due to task_struct leakage, for example. >> + */ > It doesn't look like the code is actually making this (incorrect) > assumption. I suppose the comment is from before > cgroup_is_populated() was added? Yes, it is a bug. I should have checked the tasks_count instead of using cgroup_is_populated. Thanks for catching that. > >> spin_lock_irq(&css_set_lock); >> list_for_each_entry(link, &cgrp->cset_links, cset_link) { >> cset = link->cset; >> +if (cset->dead) >> +continue; > Hmm... is this a bug fix which is necessary regardless of whether we > change the threadroot semantics or not? That is true. I put it there because the the reference counting bug fixed in patch 6 caused a lot of dead csets hanging around before the fix. I can pull this out as a separate patch. Cheers, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, On Fri, May 19, 2017 at 04:26:24PM -0400, Tejun Heo wrote: > (exactly in the way necessary), I wonder whether it'd be better to > simply allow root to be both domain and thread root. I'll give this approach a shot early next week. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, Waiman. On Mon, May 15, 2017 at 09:34:10AM -0400, Waiman Long wrote: > Now we could have something like > > R -- A -- B >\ > T1 -- T2 > > where R is the thread root, A and B are non-threaded cgroups, T1 and > T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree > where all the non-threaded resources are accounted for in R. The no > internal process constraint does not apply in the threaded subtree. > Non-threaded controllers need to properly handle the competition > between internal processes and child cgroups at the thread root. > > This model will be flexible enough to support the need of the threaded > controllers. Maybe I'm misunderstanding the design, but this seems to push the processes which belong to the threaded subtree to the parent which is part of the usual resource domain hierarchy thus breaking the no internal competition constraint. I'm not sure this is something we'd want. Given that the limitation of the original threaded mode was the required nesting below root and that we treat root special anyway (exactly in the way necessary), I wonder whether it'd be better to simply allow root to be both domain and thread root. Specific review points below but we'd probably want to discuss the overall design first. > +static inline bool cgroup_is_threaded(const struct cgroup *cgrp) > +{ > + return cgrp->proc_cgrp && (cgrp->proc_cgrp != cgrp); > +} > + > +static inline bool cgroup_is_thread_root(const struct cgroup *cgrp) > +{ > + return cgrp->proc_cgrp == cgrp; > +} Maybe add a bit of comments explaining what's going on with ->proc_cgrp? > /** > + * threaded_children_count - returns # of threaded children > + * @cgrp: cgroup to be tested > + * > + * cgroup_mutex must be held by the caller. > + */ > +static int threaded_children_count(struct cgroup *cgrp) > +{ > + struct cgroup *child; > + int count = 0; > + > + lockdep_assert_held(&cgroup_mutex); > + cgroup_for_each_live_child(child, cgrp) > + if (cgroup_is_threaded(child)) > + count++; > + return count; > +} It probably would be a good idea to keep track of the count so that we don't have to count them each time. There are cases where people end up creating a very high number of cgroups and we've already been bitten a couple times with silly complexity issues. > @@ -2982,22 +3010,48 @@ static int cgroup_enable_threaded(struct cgroup *cgrp) > LIST_HEAD(csets); > struct cgrp_cset_link *link; > struct css_set *cset, *cset_next; > + struct cgroup *child; > int ret; > + u16 ss_mask; > > lockdep_assert_held(&cgroup_mutex); > > /* noop if already threaded */ > - if (cgrp->proc_cgrp) > + if (cgroup_is_threaded(cgrp)) > return 0; > > - /* allow only if there are neither children or enabled controllers */ > - if (css_has_online_children(&cgrp->self) || cgrp->subtree_control) > + /* > + * Allow only if it is not the root and there are: > + * 1) no children, > + * 2) no non-threaded controllers are enabled, and > + * 3) no attached tasks. > + * > + * With no attached tasks, it is assumed that no css_sets will be > + * linked to the current cgroup. This may not be true if some dead > + * css_sets linger around due to task_struct leakage, for example. > + */ It doesn't look like the code is actually making this (incorrect) assumption. I suppose the comment is from before cgroup_is_populated() was added? > spin_lock_irq(&css_set_lock); > list_for_each_entry(link, &cgrp->cset_links, cset_link) { > cset = link->cset; > + if (cset->dead) > + continue; Hmm... is this a bug fix which is necessary regardless of whether we change the threadroot semantics or not? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
On 05/17/2017 05:47 PM, Tejun Heo wrote: > Hello, Waiman. > > On Mon, May 15, 2017 at 09:34:10AM -0400, Waiman Long wrote: >> The current thread mode semantics aren't sufficient to fully support >> threaded controllers like cpu. The main problem is that when thread >> mode is enabled at root (mainly for performance reason), all the >> non-threaded controllers cannot be supported at all. >> >> To alleviate this problem, the roles of thread root and threaded >> cgroups are now further separated. Now thread mode can only be enabled >> on a non-root leaf cgroup whose parent will then become the thread >> root. All the descendants of a threaded cgroup will still need to be >> threaded. All the non-threaded resource will be accounted for in the >> thread root. Unlike the previous thread mode, however, a thread root >> can have non-threaded children where system resources like memory >> can be further split down the hierarchy. >> >> Now we could have something like >> >> R -- A -- B >> \ >>T1 -- T2 >> >> where R is the thread root, A and B are non-threaded cgroups, T1 and >> T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree >> where all the non-threaded resources are accounted for in R. The no >> internal process constraint does not apply in the threaded subtree. >> Non-threaded controllers need to properly handle the competition >> between internal processes and child cgroups at the thread root. >> >> This model will be flexible enough to support the need of the threaded >> controllers. > I do like the approach and it does address the issue with requiring at > least one level of nesting for the thread mode to be used with other > controllers. I need to think a bit more about it and mull over what > Peterz was suggesting in the old thread. I'll get back to you soon > but I'd really prefer this and the earlier related patches to be in > its own patchset so that we aren't dealing with different things at > the same time. > > Thanks. > I have studied the email exchanges with your original thread mode patchset. This patchset is aimed to hopefully address all the concerns that Peterz has. This enhanced thread mode should address a big part of the concern. However, I am not sure if this patch, by itself, is enough to address all his concerns. That is why I also include 2 other major changes in the next 2 patches. My goal is to move forward to allow all controllers to be enabled for v2 eventually. We are not there yet, but I hope this patchset can move thing forward meaningfully. Regards, Longman -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
Hello, Waiman. On Mon, May 15, 2017 at 09:34:10AM -0400, Waiman Long wrote: > The current thread mode semantics aren't sufficient to fully support > threaded controllers like cpu. The main problem is that when thread > mode is enabled at root (mainly for performance reason), all the > non-threaded controllers cannot be supported at all. > > To alleviate this problem, the roles of thread root and threaded > cgroups are now further separated. Now thread mode can only be enabled > on a non-root leaf cgroup whose parent will then become the thread > root. All the descendants of a threaded cgroup will still need to be > threaded. All the non-threaded resource will be accounted for in the > thread root. Unlike the previous thread mode, however, a thread root > can have non-threaded children where system resources like memory > can be further split down the hierarchy. > > Now we could have something like > > R -- A -- B >\ > T1 -- T2 > > where R is the thread root, A and B are non-threaded cgroups, T1 and > T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree > where all the non-threaded resources are accounted for in R. The no > internal process constraint does not apply in the threaded subtree. > Non-threaded controllers need to properly handle the competition > between internal processes and child cgroups at the thread root. > > This model will be flexible enough to support the need of the threaded > controllers. I do like the approach and it does address the issue with requiring at least one level of nesting for the thread mode to be used with other controllers. I need to think a bit more about it and mull over what Peterz was suggesting in the old thread. I'll get back to you soon but I'd really prefer this and the earlier related patches to be in its own patchset so that we aren't dealing with different things at the same time. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics
The current thread mode semantics aren't sufficient to fully support threaded controllers like cpu. The main problem is that when thread mode is enabled at root (mainly for performance reason), all the non-threaded controllers cannot be supported at all. To alleviate this problem, the roles of thread root and threaded cgroups are now further separated. Now thread mode can only be enabled on a non-root leaf cgroup whose parent will then become the thread root. All the descendants of a threaded cgroup will still need to be threaded. All the non-threaded resource will be accounted for in the thread root. Unlike the previous thread mode, however, a thread root can have non-threaded children where system resources like memory can be further split down the hierarchy. Now we could have something like R -- A -- B \ T1 -- T2 where R is the thread root, A and B are non-threaded cgroups, T1 and T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree where all the non-threaded resources are accounted for in R. The no internal process constraint does not apply in the threaded subtree. Non-threaded controllers need to properly handle the competition between internal processes and child cgroups at the thread root. This model will be flexible enough to support the need of the threaded controllers. Signed-off-by: Waiman Long --- Documentation/cgroup-v2.txt | 51 +++ kernel/cgroup/cgroup-internal.h | 10 +++ kernel/cgroup/cgroup.c | 186 +++- 3 files changed, 209 insertions(+), 38 deletions(-) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 1c6f5a9..3ae7e9c 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -222,21 +222,32 @@ process can be put in different cgroups and are not subject to the no internal process constraint - threaded controllers can be enabled on non-leaf cgroups whether they have threads in them or not. -To enable the thread mode, the following conditions must be met. +To enable the thread mode on a cgroup, the following conditions must +be met. -- The thread root doesn't have any child cgroups. +- The cgroup doesn't have any child cgroups. -- The thread root doesn't have any controllers enabled. +- The cgroup doesn't have any non-threaded controllers enabled. + +- The cgroup doesn't have any processes attached to it. Thread mode can be enabled by writing "enable" to "cgroup.threads" file. # echo enable > cgroup.threads -Inside a threaded subtree, "cgroup.threads" can be read and contains -the list of the thread IDs of all threads in the cgroup. Except that -the operations are per-thread instead of per-process, "cgroup.threads" -has the same format and behaves the same way as "cgroup.procs". +The parent of the threaded cgroup will become the thread root, if +it hasn't been a thread root yet. In other word, thread mode cannot +be enabled on the root cgroup as it doesn't have a parent cgroup. A +thread root can have child cgroups and controllers enabled before +becoming one. + +A threaded subtree includes the thread root and all the threaded child +cgroups as well as their descendants which are all threaded cgroups. +"cgroup.threads" can be read and contains the list of the thread +IDs of all threads in the cgroup. Except that the operations are +per-thread instead of per-process, "cgroup.threads" has the same +format and behaves the same way as "cgroup.procs". The thread root serves as the resource domain for the whole subtree, and, while the threads can be scattered across the subtree, all the @@ -246,25 +257,30 @@ not readable in the subtree proper. However, "cgroup.procs" can be written to from anywhere in the subtree to migrate all threads of the matching process to the cgroup. -Only threaded controllers can be enabled in a threaded subtree. When -a threaded controller is enabled inside a threaded subtree, it only -accounts for and controls resource consumptions associated with the -threads in the cgroup and its descendants. All consumptions which -aren't tied to a specific thread belong to the thread root. +Only threaded controllers can be enabled in a non-root threaded cgroup. +When a threaded controller is enabled inside a threaded subtree, +it only accounts for and controls resource consumptions associated +with the threads in the cgroup and its descendants. All consumptions +which aren't tied to a specific thread belong to the thread root. Because a threaded subtree is exempt from no internal process constraint, a threaded controller must be able to handle competition between threads in a non-leaf cgroup and its child cgroups. Each threaded controller defines how such competitions are handled. +A new child cgroup created under a thread root will not be threaded. +Thread mode has to be explicitly enabled on each of the thread root's +children. Descendants of a threaded cgroup, however, will a