Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-03 Thread Tejun Heo
Hello,

On Fri, Jun 02, 2017 at 04:36:22PM -0400, Waiman Long wrote:
> I wouldn't argue further on that if you insist. However, I still want to

Oh, please don't get me wrong.  I'm not trying to shut down the
discussion or anything.  It's just that whole-scope discussions can
get very meandering and time-consuming when these two issues can be
decoupled from each other without compromising on either.  Let's
approach these issues separately.

> relax the constraint somewhat by abandoning the no internal process
> constraint  when only threaded controllers (non-resource domains) are
> enabled even when thread mode has not been explicitly enabled. It is a
> modified version my second alternative. Now the question is which
> controllers are considered to be resource domains. I think memory and
> blkio are in the list. What else do you think should be considered
> resource domains?

And we're now a bit into repeating ourselves but for controlling of
any significant resources (mostly cpu, memory, io), there gotta be
significant portion of resource consumption which isn't tied to
spcific processes or threads that should be accounted for.  Both
memory and io already do this to a certain extent, but not completely.
cpu doesn't do it at all yet but we usually can't / shouldn't declare
a resource category to be domain-free.

There are exceptions - controllers which are only used for membership
identification (perf and the old net controllers), pids which is
explicitly tied to tasks (note that CPU cycles aren't), cpuset which
is an attribute propagating / restricting controller.

Out of those, the identification uses already aren't affected by the
constraint as they're now all either direct membership test against
the hierarchy or implicit controllers which aren't subject to the
constraint.  That leaves pids and cpuset.  We can exempt them from the
constraint but I'm not quite sure what that buys us given that neither
is affected by requiring explicit leaf nodes.  It'd just make the
rules more complicated without actual benefits.

That said, we can exempt those two.  I don't see much point in it but
we can definitely discuss the pros and cons, and it's likely that it's
not gonna make much difference wichever way we choose.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-02 Thread Waiman Long
On 06/01/2017 05:18 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Jun 01, 2017 at 05:12:42PM -0400, Waiman Long wrote:
>> Are you referring to keeping the no internal process restriction and
>> document how to work around that instead? I would like to hear what
>> workarounds are currently being used.
> What we've been talking about all along - just creating explicit leaf
> nodes.
>
>> Anyway, you currently allow internal process in thread mode, but not in
>> non-thread mode. I would prefer no such restriction in both thread and
>> non-thread mode.
> Heh, so, these aren't arbitrary.  The contraint is tied to
> implementing resource domains and thread subtree doesn't have resource
> domains in them, so they don't need the constraint.  I'm sorry about
> the short replies but I'm kinda really tied up right now.  I'm gonna
> do the thread mode so that it can be agnostic w.r.t. the internal
> process constraint and I think it could be helpful to decouple these
> discussions.  We've been having this discussion for a couple years now
> and it looks like we're gonna go through it all over, which is fine,
> but let's at least keep that separate.

I wouldn't argue further on that if you insist. However, I still want to
relax the constraint somewhat by abandoning the no internal process
constraint  when only threaded controllers (non-resource domains) are
enabled even when thread mode has not been explicitly enabled. It is a
modified version my second alternative. Now the question is which
controllers are considered to be resource domains. I think memory and
blkio are in the list. What else do you think should be considered
resource domains?

Cheers,
Longman



any of the resource domains (!threaded) controllers are enabled.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Tejun Heo
Hello,

On Thu, Jun 01, 2017 at 05:12:42PM -0400, Waiman Long wrote:
> Are you referring to keeping the no internal process restriction and
> document how to work around that instead? I would like to hear what
> workarounds are currently being used.

What we've been talking about all along - just creating explicit leaf
nodes.

> Anyway, you currently allow internal process in thread mode, but not in
> non-thread mode. I would prefer no such restriction in both thread and
> non-thread mode.

Heh, so, these aren't arbitrary.  The contraint is tied to
implementing resource domains and thread subtree doesn't have resource
domains in them, so they don't need the constraint.  I'm sorry about
the short replies but I'm kinda really tied up right now.  I'm gonna
do the thread mode so that it can be agnostic w.r.t. the internal
process constraint and I think it could be helpful to decouple these
discussions.  We've been having this discussion for a couple years now
and it looks like we're gonna go through it all over, which is fine,
but let's at least keep that separate.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Waiman Long
On 06/01/2017 04:52 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Jun 01, 2017 at 04:48:48PM -0400, Waiman Long wrote:
>> I think we are on agreement here. I should we should just document how
>> userland can work around the internal process competition issue by
>> setting up the cgroup hierarchy properly. Then we can remove the no
>> internal process constraint.
> Heh, we agree on the immediate solution but not the final direction.
> This requirement affects how controllers implement resource control in
> significant ways.  It is a restriction which can be worked around in
> userland relatively easily.  I'd much prefer to keep the invariant
> intact.
>
> Thanks.
>
Are you referring to keeping the no internal process restriction and
document how to work around that instead? I would like to hear what
workarounds are currently being used.

Anyway, you currently allow internal process in thread mode, but not in
non-thread mode. I would prefer no such restriction in both thread and
non-thread mode.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Tejun Heo
Hello,

On Thu, Jun 01, 2017 at 04:48:48PM -0400, Waiman Long wrote:
> I think we are on agreement here. I should we should just document how
> userland can work around the internal process competition issue by
> setting up the cgroup hierarchy properly. Then we can remove the no
> internal process constraint.

Heh, we agree on the immediate solution but not the final direction.
This requirement affects how controllers implement resource control in
significant ways.  It is a restriction which can be worked around in
userland relatively easily.  I'd much prefer to keep the invariant
intact.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Waiman Long
On 06/01/2017 04:38 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Jun 01, 2017 at 03:27:35PM -0400, Waiman Long wrote:
>> As said in an earlier email, I agreed that masking it on the kernel side
>> may not be the best solution. I offer 2 other alternatives:
>> 1) Document on how to work around the resource domains issue by proper
>> setup of the cgroup hierarchy.
> We can definitely improve documentation.
>
>> 2) Mark those controllers that require the no internal process
>> competition constraint and disallow internal process only when those
>> controllers are active.
> We *can* do that but wouldn't this be equivalent to enabling thread
> mode implicitly when only thread aware controllers are enabled?
>
>> I prefer the first alternative, but I can go with the second if necessary.
>>
>> The major rationale behind my enhanced thread mode patch was to allow
>> something like
>>
>>  R -- A -- B
>>  \
>>   T1 -- T2
>>
>> where you can have resource domain controllers enabled in the thread
>> root as well as some child cgroups of the thread root. As no internal
>> process rule is currently not applicable to the thread root, this
>> creates the dilemma that we need to deal with internal process competition.
>>
>> The container invariant that PeterZ talked about will also be a serious
>> issue here as I don't think we are going to set up a container root
>> cgroup that will have no process allowed in it because it has some child
>> cgroups. IMHO, I don't think cgroup v2 will get wide adoption without
>> getting rid of that no internal process constraint.
> The only thing which is necessary from inside a container is putting
> the management processes into their own cgroups so that they can be
> controlled (ie. the same thing you did with your patch but doing that
> explicitly from userland) and userland management sw can do the same
> thing whether it's inside a container or on a bare system.  BTW,
> systemd already does so and works completely fine in terms of
> containerization on cgroup2.  It is arguable whether we should make
> this more convenient from kernel side but using cgroup2 for resource
> control already requires the userspace tools to be adapted to it, so
> I'm not sure how much benefit we'd gain from adding that compared to
> explicitly documenting it.

I think we are on agreement here. I should we should just document how
userland can work around the internal process competition issue by
setting up the cgroup hierarchy properly. Then we can remove the no
internal process constraint.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Tejun Heo
Hello,

On Thu, Jun 01, 2017 at 03:27:35PM -0400, Waiman Long wrote:
> As said in an earlier email, I agreed that masking it on the kernel side
> may not be the best solution. I offer 2 other alternatives:
> 1) Document on how to work around the resource domains issue by proper
> setup of the cgroup hierarchy.

We can definitely improve documentation.

> 2) Mark those controllers that require the no internal process
> competition constraint and disallow internal process only when those
> controllers are active.

We *can* do that but wouldn't this be equivalent to enabling thread
mode implicitly when only thread aware controllers are enabled?

> I prefer the first alternative, but I can go with the second if necessary.
> 
> The major rationale behind my enhanced thread mode patch was to allow
> something like
> 
>  R -- A -- B
>  \
>   T1 -- T2
> 
> where you can have resource domain controllers enabled in the thread
> root as well as some child cgroups of the thread root. As no internal
> process rule is currently not applicable to the thread root, this
> creates the dilemma that we need to deal with internal process competition.
> 
> The container invariant that PeterZ talked about will also be a serious
> issue here as I don't think we are going to set up a container root
> cgroup that will have no process allowed in it because it has some child
> cgroups. IMHO, I don't think cgroup v2 will get wide adoption without
> getting rid of that no internal process constraint.

The only thing which is necessary from inside a container is putting
the management processes into their own cgroups so that they can be
controlled (ie. the same thing you did with your patch but doing that
explicitly from userland) and userland management sw can do the same
thing whether it's inside a container or on a bare system.  BTW,
systemd already does so and works completely fine in terms of
containerization on cgroup2.  It is arguable whether we should make
this more convenient from kernel side but using cgroup2 for resource
control already requires the userspace tools to be adapted to it, so
I'm not sure how much benefit we'd gain from adding that compared to
explicitly documenting it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Waiman Long
On 06/01/2017 11:10 AM, Peter Zijlstra wrote:
> On Thu, Jun 01, 2017 at 10:50:42AM -0400, Tejun Heo wrote:
>> Hello, Waiman.
>>
>> A short update.  I tried making root special while keeping the
>> existing threaded semantics but I didn't really like it because we
>> have to couple controller enables/disables with threaded
>> enables/disables.  I'm now trying a simpler, albeit a bit more
>> tedious, approach which should leave things mostly symmetrical.  I'm
>> hoping to be able to post mostly working patches this week.
> I've not had time to look at any of this. But the question I'm most
> curious about is how cgroup-v2 preserves the container invariant.

If you don't have much time to look at the patch, I will suggest just
looking at the cover letter as well as changes to the cgroup-v2.txt
file. You will get a pretty good overview of what this patchset is about.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Waiman Long
On 06/01/2017 02:44 PM, Waiman Long wrote:
> On 06/01/2017 11:10 AM, Peter Zijlstra wrote:
>> On Thu, Jun 01, 2017 at 10:50:42AM -0400, Tejun Heo wrote:
>>> Hello, Waiman.
>>>
>>> A short update.  I tried making root special while keeping the
>>> existing threaded semantics but I didn't really like it because we
>>> have to couple controller enables/disables with threaded
>>> enables/disables.  I'm now trying a simpler, albeit a bit more
>>> tedious, approach which should leave things mostly symmetrical.  I'm
>>> hoping to be able to post mostly working patches this week.
>> I've not had time to look at any of this. But the question I'm most
>> curious about is how cgroup-v2 preserves the container invariant.
>>
>> That is, each container (namespace) should look like a 'real' machine.
>> So just like userns allows to have a uid-0 (aka root) for each container
>> and pidns allows a pid-1 for each container, cgroupns should provide a
>> root group for each container.
>>
>> And cgroup-v2 has this 'exception' (aka wart) for the root group which
>> needs to be replicated for each namespace.
> One of the changes that I proposed in my patches was to get rid of the
> no internal process constraint. I think that will solve a big part of
> the container invariant problem that we have with cgroup v2.
>
> Cheers,
> Longman

Another idea that I have to further solve this container invariant
problem is do a cgroup setup like

CP -- CR

CP - container parent belong to the host
CR - container root

We can enable the pass-through mode at the subtree_control file of CP to
force all CR controllers in pass-through mode. In this case, those
controllers are not enabled in the CR like the root. However, the
container can enable those in the child cgroups just like the root
controller. By enabling those controller in the CP level, the host can
control how much resource is being allowed in the container without the
container being aware that its resources are being controlled as all the
control knobs will show up in the CP, but not in CR.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Waiman Long
On 06/01/2017 02:47 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Thu, Jun 01, 2017 at 02:44:48PM -0400, Waiman Long wrote:
>>> And cgroup-v2 has this 'exception' (aka wart) for the root group which
>>> needs to be replicated for each namespace.
>> One of the changes that I proposed in my patches was to get rid of the
>> no internal process constraint. I think that will solve a big part of
>> the container invariant problem that we have with cgroup v2.
> I'm not sure.  It just masks it without actually solving it.  I mean,
> the constraint is thereq for a reason.  "Solving" it would defeat one
> of the main capabilities for resource domains and masking it from
> kernel side doesn't make whole lot of sense to me given that it's
> something which can be easily done from userland.  If we take out that
> part, for controllers which don't care about resource domains,
> wouldn't thread mode be a sufficient solution?

As said in an earlier email, I agreed that masking it on the kernel side
may not be the best solution. I offer 2 other alternatives:
1) Document on how to work around the resource domains issue by proper
setup of the cgroup hierarchy.
2) Mark those controllers that require the no internal process
competition constraint and disallow internal process only when those
controllers are active.

I prefer the first alternative, but I can go with the second if necessary.

The major rationale behind my enhanced thread mode patch was to allow
something like

 R -- A -- B
 \
  T1 -- T2

where you can have resource domain controllers enabled in the thread
root as well as some child cgroups of the thread root. As no internal
process rule is currently not applicable to the thread root, this
creates the dilemma that we need to deal with internal process competition.

The container invariant that PeterZ talked about will also be a serious
issue here as I don't think we are going to set up a container root
cgroup that will have no process allowed in it because it has some child
cgroups. IMHO, I don't think cgroup v2 will get wide adoption without
getting rid of that no internal process constraint.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Tejun Heo
Hello, Waiman.

On Thu, Jun 01, 2017 at 02:44:48PM -0400, Waiman Long wrote:
> > And cgroup-v2 has this 'exception' (aka wart) for the root group which
> > needs to be replicated for each namespace.
> 
> One of the changes that I proposed in my patches was to get rid of the
> no internal process constraint. I think that will solve a big part of
> the container invariant problem that we have with cgroup v2.

I'm not sure.  It just masks it without actually solving it.  I mean,
the constraint is thereq for a reason.  "Solving" it would defeat one
of the main capabilities for resource domains and masking it from
kernel side doesn't make whole lot of sense to me given that it's
something which can be easily done from userland.  If we take out that
part, for controllers which don't care about resource domains,
wouldn't thread mode be a sufficient solution?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Waiman Long
On 06/01/2017 11:10 AM, Peter Zijlstra wrote:
> On Thu, Jun 01, 2017 at 10:50:42AM -0400, Tejun Heo wrote:
>> Hello, Waiman.
>>
>> A short update.  I tried making root special while keeping the
>> existing threaded semantics but I didn't really like it because we
>> have to couple controller enables/disables with threaded
>> enables/disables.  I'm now trying a simpler, albeit a bit more
>> tedious, approach which should leave things mostly symmetrical.  I'm
>> hoping to be able to post mostly working patches this week.
> I've not had time to look at any of this. But the question I'm most
> curious about is how cgroup-v2 preserves the container invariant.
>
> That is, each container (namespace) should look like a 'real' machine.
> So just like userns allows to have a uid-0 (aka root) for each container
> and pidns allows a pid-1 for each container, cgroupns should provide a
> root group for each container.
>
> And cgroup-v2 has this 'exception' (aka wart) for the root group which
> needs to be replicated for each namespace.

One of the changes that I proposed in my patches was to get rid of the
no internal process constraint. I think that will solve a big part of
the container invariant problem that we have with cgroup v2.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Waiman Long
On 06/01/2017 10:50 AM, Tejun Heo wrote:
> Hello, Waiman.
>
> A short update.  I tried making root special while keeping the
> existing threaded semantics but I didn't really like it because we
> have to couple controller enables/disables with threaded
> enables/disables.  I'm now trying a simpler, albeit a bit more
> tedious, approach which should leave things mostly symmetrical.  I'm
> hoping to be able to post mostly working patches this week.

I am looking forward to your patches.

> Also, do you mind posting the debug patches as a separate series?
> Let's get the bits which make sense indepdently in the tree.

I am going to do that. The debug patches, however, will have dependency
on other cgroup patches and so will need to be posted after the core
patches.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Tejun Heo
Hello, Peter.

On Thu, Jun 01, 2017 at 05:10:45PM +0200, Peter Zijlstra wrote:
> I've not had time to look at any of this. But the question I'm most
> curious about is how cgroup-v2 preserves the container invariant.
> 
> That is, each container (namespace) should look like a 'real' machine.
> So just like userns allows to have a uid-0 (aka root) for each container
> and pidns allows a pid-1 for each container, cgroupns should provide a
> root group for each container.
> 
> And cgroup-v2 has this 'exception' (aka wart) for the root group which
> needs to be replicated for each namespace.

The goal has never been that a container must be indistinguishible
from a real machine.  For certain things, things simply don't have
exact equivalents due to sharing (memory stats or journal writes for
example) and those things are exactly why people prefer containers
over VMs for certain use cases.  If one wants full replication, VM
would be the way to go.

The goal is allowing enough container invariant so that appropriate
workloads can be contained and co-exist in useful ways.  This also
means that the contained workload is usually either a bit illiterate
w.r.t. to the system details (doesn't care) or makes some adjustments
for running inside a container (most quasi-full-system ones already
do).

System root is inherently different from all other nested roots.
Making some exceptions for the root isn't about taking away from other
roots but more reflecting the inherent differences - there are things
which are inherently system / bare-metal.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Peter Zijlstra
On Thu, Jun 01, 2017 at 10:50:42AM -0400, Tejun Heo wrote:
> Hello, Waiman.
> 
> A short update.  I tried making root special while keeping the
> existing threaded semantics but I didn't really like it because we
> have to couple controller enables/disables with threaded
> enables/disables.  I'm now trying a simpler, albeit a bit more
> tedious, approach which should leave things mostly symmetrical.  I'm
> hoping to be able to post mostly working patches this week.

I've not had time to look at any of this. But the question I'm most
curious about is how cgroup-v2 preserves the container invariant.

That is, each container (namespace) should look like a 'real' machine.
So just like userns allows to have a uid-0 (aka root) for each container
and pidns allows a pid-1 for each container, cgroupns should provide a
root group for each container.

And cgroup-v2 has this 'exception' (aka wart) for the root group which
needs to be replicated for each namespace.


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-06-01 Thread Tejun Heo
Hello, Waiman.

A short update.  I tried making root special while keeping the
existing threaded semantics but I didn't really like it because we
have to couple controller enables/disables with threaded
enables/disables.  I'm now trying a simpler, albeit a bit more
tedious, approach which should leave things mostly symmetrical.  I'm
hoping to be able to post mostly working patches this week.

Also, do you mind posting the debug patches as a separate series?
Let's get the bits which make sense indepdently in the tree.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-24 Thread Tejun Heo
Hello,

On Wed, May 24, 2017 at 05:17:13PM -0400, Waiman Long wrote:
> An alternative is to have separate enabling for thread root. For example,
> 
> # echo root > cgroup.threads
> # echo enable > child/cgroup.threads
> 
> The first statement make the current cgroup the thread root. However,
> setting it to a thread root doesn't make its child to be threaded. This
> have to be explicitly done on each of the children. Once a child cgroup
> is made to be threaded, all its descendants will be threaded. That will
> have the same effect as the current patch.

Yeah, I'm toying with different ideas.  I'll get back to you once
things get more concrete.

> With delegation, do you mean the relationship between a container and
> its host?

It can be but doesn't have to be.  For example, it can be delegations
to users without namespace / container being involved.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-24 Thread Waiman Long
On 05/24/2017 04:36 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Mon, May 22, 2017 at 01:13:16PM -0400, Waiman Long wrote:
>>> Maybe I'm misunderstanding the design, but this seems to push the
>>> processes which belong to the threaded subtree to the parent which is
>>> part of the usual resource domain hierarchy thus breaking the no
>>> internal competition constraint.  I'm not sure this is something we'd
>>> want.  Given that the limitation of the original threaded mode was the
>>> required nesting below root and that we treat root special anyway
>>> (exactly in the way necessary), I wonder whether it'd be better to
>>> simply allow root to be both domain and thread root.
>> Yes, root can be both domain and thread root. I haven't placed any
>> restriction on that.
> I've been playing with the proposed "make the parent resource domain".
> Unfortunately, the parent - child relationship becomes weird.
>
> The parent becomes the thread root, which means that its
> cgroup.threads file becomes writable and threads can be put in there.
> It's really weird to write to a child's interface and have the
> parent's behavior changed.  This becomes weirder with delegation.  If
> a cgroup is delegated, its cgroup.threads should be delegated too but
> if the child enables threaded mode, that makes the undelegated parent
> thread root, which means that either 1. the delegatee can't migrate
> threads to the thread root or 2. if the parent's cgroup.threads is
> writeable, the delegatee can mass with other descendants under it
> which shouldn't be allowed.
>
> I think the operation of making a cgroup a thread root should happen
> on the cgroup where that's requested; otherwise, nesting becomes too
> twisted.  This should be solvable.  Will think more about it.
>
> Thanks.
>
An alternative is to have separate enabling for thread root. For example,

# echo root > cgroup.threads
# echo enable > child/cgroup.threads

The first statement make the current cgroup the thread root. However,
setting it to a thread root doesn't make its child to be threaded. This
have to be explicitly done on each of the children. Once a child cgroup
is made to be threaded, all its descendants will be threaded. That will
have the same effect as the current patch.

With delegation, do you mean the relationship between a container and
its host?

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-24 Thread Tejun Heo
Hello, Waiman.

On Mon, May 22, 2017 at 01:13:16PM -0400, Waiman Long wrote:
> > Maybe I'm misunderstanding the design, but this seems to push the
> > processes which belong to the threaded subtree to the parent which is
> > part of the usual resource domain hierarchy thus breaking the no
> > internal competition constraint.  I'm not sure this is something we'd
> > want.  Given that the limitation of the original threaded mode was the
> > required nesting below root and that we treat root special anyway
> > (exactly in the way necessary), I wonder whether it'd be better to
> > simply allow root to be both domain and thread root.
> 
> Yes, root can be both domain and thread root. I haven't placed any
> restriction on that.

I've been playing with the proposed "make the parent resource domain".
Unfortunately, the parent - child relationship becomes weird.

The parent becomes the thread root, which means that its
cgroup.threads file becomes writable and threads can be put in there.
It's really weird to write to a child's interface and have the
parent's behavior changed.  This becomes weirder with delegation.  If
a cgroup is delegated, its cgroup.threads should be delegated too but
if the child enables threaded mode, that makes the undelegated parent
thread root, which means that either 1. the delegatee can't migrate
threads to the thread root or 2. if the parent's cgroup.threads is
writeable, the delegatee can mass with other descendants under it
which shouldn't be allowed.

I think the operation of making a cgroup a thread root should happen
on the cgroup where that's requested; otherwise, nesting becomes too
twisted.  This should be solvable.  Will think more about it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-22 Thread Waiman Long
On 05/22/2017 01:13 PM, Waiman Long wrote:
> On 05/19/2017 04:26 PM, Tejun Heo wrote:
>>> @@ -2982,22 +3010,48 @@ static int cgroup_enable_threaded(struct cgroup 
>>> *cgrp)
>>> LIST_HEAD(csets);
>>> struct cgrp_cset_link *link;
>>> struct css_set *cset, *cset_next;
>>> +   struct cgroup *child;
>>> int ret;
>>> +   u16 ss_mask;
>>>  
>>> lockdep_assert_held(&cgroup_mutex);
>>>  
>>> /* noop if already threaded */
>>> -   if (cgrp->proc_cgrp)
>>> +   if (cgroup_is_threaded(cgrp))
>>> return 0;
>>>  
>>> -   /* allow only if there are neither children or enabled controllers */
>>> -   if (css_has_online_children(&cgrp->self) || cgrp->subtree_control)
>>> +   /*
>>> +* Allow only if it is not the root and there are:
>>> +* 1) no children,
>>> +* 2) no non-threaded controllers are enabled, and
>>> +* 3) no attached tasks.
>>> +*
>>> +* With no attached tasks, it is assumed that no css_sets will be
>>> +* linked to the current cgroup. This may not be true if some dead
>>> +* css_sets linger around due to task_struct leakage, for example.
>>> +*/
>> It doesn't look like the code is actually making this (incorrect)
>> assumption.  I suppose the comment is from before
>> cgroup_is_populated() was added?
> Yes, it is a bug. I should have checked the tasks_count instead of using
> cgroup_is_populated. Thanks for catching that.

Sorry, I would like to take it back. I think cgroup_is_populated() will
be set if there is any task attached to the cgroup. So I think it is
doing the right thing with regard to (3).

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-22 Thread Waiman Long
On 05/19/2017 04:26 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Mon, May 15, 2017 at 09:34:10AM -0400, Waiman Long wrote:
>> Now we could have something like
>>
>>  R -- A -- B
>>   \
>>T1 -- T2
>>
>> where R is the thread root, A and B are non-threaded cgroups, T1 and
>> T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree
>> where all the non-threaded resources are accounted for in R.  The no
>> internal process constraint does not apply in the threaded subtree.
>> Non-threaded controllers need to properly handle the competition
>> between internal processes and child cgroups at the thread root.
>>
>> This model will be flexible enough to support the need of the threaded
>> controllers.
> Maybe I'm misunderstanding the design, but this seems to push the
> processes which belong to the threaded subtree to the parent which is
> part of the usual resource domain hierarchy thus breaking the no
> internal competition constraint.  I'm not sure this is something we'd
> want.  Given that the limitation of the original threaded mode was the
> required nesting below root and that we treat root special anyway
> (exactly in the way necessary), I wonder whether it'd be better to
> simply allow root to be both domain and thread root.

Yes, root can be both domain and thread root. I haven't placed any
restriction on that.

>
> Specific review points below but we'd probably want to discuss the
> overall design first.
>
>> +static inline bool cgroup_is_threaded(const struct cgroup *cgrp)
>> +{
>> +return cgrp->proc_cgrp && (cgrp->proc_cgrp != cgrp);
>> +}
>> +
>> +static inline bool cgroup_is_thread_root(const struct cgroup *cgrp)
>> +{
>> +return cgrp->proc_cgrp == cgrp;
>> +}
> Maybe add a bit of comments explaining what's going on with
> ->proc_cgrp?

Sure, will do that.

>>  /**
>> + * threaded_children_count - returns # of threaded children
>> + * @cgrp: cgroup to be tested
>> + *
>> + * cgroup_mutex must be held by the caller.
>> + */
>> +static int threaded_children_count(struct cgroup *cgrp)
>> +{
>> +struct cgroup *child;
>> +int count = 0;
>> +
>> +lockdep_assert_held(&cgroup_mutex);
>> +cgroup_for_each_live_child(child, cgrp)
>> +if (cgroup_is_threaded(child))
>> +count++;
>> +return count;
>> +}
> It probably would be a good idea to keep track of the count so that we
> don't have to count them each time.  There are cases where people end
> up creating a very high number of cgroups and we've already been
> bitten a couple times with silly complexity issues.

Thanks for the suggestion, I can keep a count in the cgroup strcture to
avoid doing that repetitively.

>
>> @@ -2982,22 +3010,48 @@ static int cgroup_enable_threaded(struct cgroup 
>> *cgrp)
>>  LIST_HEAD(csets);
>>  struct cgrp_cset_link *link;
>>  struct css_set *cset, *cset_next;
>> +struct cgroup *child;
>>  int ret;
>> +u16 ss_mask;
>>  
>>  lockdep_assert_held(&cgroup_mutex);
>>  
>>  /* noop if already threaded */
>> -if (cgrp->proc_cgrp)
>> +if (cgroup_is_threaded(cgrp))
>>  return 0;
>>  
>> -/* allow only if there are neither children or enabled controllers */
>> -if (css_has_online_children(&cgrp->self) || cgrp->subtree_control)
>> +/*
>> + * Allow only if it is not the root and there are:
>> + * 1) no children,
>> + * 2) no non-threaded controllers are enabled, and
>> + * 3) no attached tasks.
>> + *
>> + * With no attached tasks, it is assumed that no css_sets will be
>> + * linked to the current cgroup. This may not be true if some dead
>> + * css_sets linger around due to task_struct leakage, for example.
>> + */
> It doesn't look like the code is actually making this (incorrect)
> assumption.  I suppose the comment is from before
> cgroup_is_populated() was added?

Yes, it is a bug. I should have checked the tasks_count instead of using
cgroup_is_populated. Thanks for catching that.

>
>>  spin_lock_irq(&css_set_lock);
>>  list_for_each_entry(link, &cgrp->cset_links, cset_link) {
>>  cset = link->cset;
>> +if (cset->dead)
>> +continue;
> Hmm... is this a bug fix which is necessary regardless of whether we
> change the threadroot semantics or not?

That is true. I put it there because the the reference counting bug
fixed in patch 6 caused a lot of dead csets hanging around before the
fix. I can pull this out as a separate patch.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-19 Thread Tejun Heo
Hello,

On Fri, May 19, 2017 at 04:26:24PM -0400, Tejun Heo wrote:
> (exactly in the way necessary), I wonder whether it'd be better to
> simply allow root to be both domain and thread root.

I'll give this approach a shot early next week.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-19 Thread Tejun Heo
Hello, Waiman.

On Mon, May 15, 2017 at 09:34:10AM -0400, Waiman Long wrote:
> Now we could have something like
> 
>   R -- A -- B
>\
> T1 -- T2
> 
> where R is the thread root, A and B are non-threaded cgroups, T1 and
> T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree
> where all the non-threaded resources are accounted for in R.  The no
> internal process constraint does not apply in the threaded subtree.
> Non-threaded controllers need to properly handle the competition
> between internal processes and child cgroups at the thread root.
> 
> This model will be flexible enough to support the need of the threaded
> controllers.

Maybe I'm misunderstanding the design, but this seems to push the
processes which belong to the threaded subtree to the parent which is
part of the usual resource domain hierarchy thus breaking the no
internal competition constraint.  I'm not sure this is something we'd
want.  Given that the limitation of the original threaded mode was the
required nesting below root and that we treat root special anyway
(exactly in the way necessary), I wonder whether it'd be better to
simply allow root to be both domain and thread root.

Specific review points below but we'd probably want to discuss the
overall design first.

> +static inline bool cgroup_is_threaded(const struct cgroup *cgrp)
> +{
> + return cgrp->proc_cgrp && (cgrp->proc_cgrp != cgrp);
> +}
> +
> +static inline bool cgroup_is_thread_root(const struct cgroup *cgrp)
> +{
> + return cgrp->proc_cgrp == cgrp;
> +}

Maybe add a bit of comments explaining what's going on with
->proc_cgrp?

>  /**
> + * threaded_children_count - returns # of threaded children
> + * @cgrp: cgroup to be tested
> + *
> + * cgroup_mutex must be held by the caller.
> + */
> +static int threaded_children_count(struct cgroup *cgrp)
> +{
> + struct cgroup *child;
> + int count = 0;
> +
> + lockdep_assert_held(&cgroup_mutex);
> + cgroup_for_each_live_child(child, cgrp)
> + if (cgroup_is_threaded(child))
> + count++;
> + return count;
> +}

It probably would be a good idea to keep track of the count so that we
don't have to count them each time.  There are cases where people end
up creating a very high number of cgroups and we've already been
bitten a couple times with silly complexity issues.

> @@ -2982,22 +3010,48 @@ static int cgroup_enable_threaded(struct cgroup *cgrp)
>   LIST_HEAD(csets);
>   struct cgrp_cset_link *link;
>   struct css_set *cset, *cset_next;
> + struct cgroup *child;
>   int ret;
> + u16 ss_mask;
>  
>   lockdep_assert_held(&cgroup_mutex);
>  
>   /* noop if already threaded */
> - if (cgrp->proc_cgrp)
> + if (cgroup_is_threaded(cgrp))
>   return 0;
>  
> - /* allow only if there are neither children or enabled controllers */
> - if (css_has_online_children(&cgrp->self) || cgrp->subtree_control)
> + /*
> +  * Allow only if it is not the root and there are:
> +  * 1) no children,
> +  * 2) no non-threaded controllers are enabled, and
> +  * 3) no attached tasks.
> +  *
> +  * With no attached tasks, it is assumed that no css_sets will be
> +  * linked to the current cgroup. This may not be true if some dead
> +  * css_sets linger around due to task_struct leakage, for example.
> +  */

It doesn't look like the code is actually making this (incorrect)
assumption.  I suppose the comment is from before
cgroup_is_populated() was added?

>   spin_lock_irq(&css_set_lock);
>   list_for_each_entry(link, &cgrp->cset_links, cset_link) {
>   cset = link->cset;
> + if (cset->dead)
> + continue;

Hmm... is this a bug fix which is necessary regardless of whether we
change the threadroot semantics or not?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-18 Thread Waiman Long
On 05/17/2017 05:47 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Mon, May 15, 2017 at 09:34:10AM -0400, Waiman Long wrote:
>> The current thread mode semantics aren't sufficient to fully support
>> threaded controllers like cpu. The main problem is that when thread
>> mode is enabled at root (mainly for performance reason), all the
>> non-threaded controllers cannot be supported at all.
>>
>> To alleviate this problem, the roles of thread root and threaded
>> cgroups are now further separated. Now thread mode can only be enabled
>> on a non-root leaf cgroup whose parent will then become the thread
>> root. All the descendants of a threaded cgroup will still need to be
>> threaded. All the non-threaded resource will be accounted for in the
>> thread root. Unlike the previous thread mode, however, a thread root
>> can have non-threaded children where system resources like memory
>> can be further split down the hierarchy.
>>
>> Now we could have something like
>>
>>  R -- A -- B
>>   \
>>T1 -- T2
>>
>> where R is the thread root, A and B are non-threaded cgroups, T1 and
>> T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree
>> where all the non-threaded resources are accounted for in R.  The no
>> internal process constraint does not apply in the threaded subtree.
>> Non-threaded controllers need to properly handle the competition
>> between internal processes and child cgroups at the thread root.
>>
>> This model will be flexible enough to support the need of the threaded
>> controllers.
> I do like the approach and it does address the issue with requiring at
> least one level of nesting for the thread mode to be used with other
> controllers.  I need to think a bit more about it and mull over what
> Peterz was suggesting in the old thread.  I'll get back to you soon
> but I'd really prefer this and the earlier related patches to be in
> its own patchset so that we aren't dealing with different things at
> the same time.
>
> Thanks.
>
I have studied the email exchanges with your original thread mode
patchset. This patchset is aimed to hopefully address all the concerns
that Peterz has. This enhanced thread mode should address a big part of
the concern. However, I am not sure if this patch, by itself, is enough
to address all his concerns. That is why I also include 2 other major
changes in the next 2 patches. My goal is to move forward to allow all
controllers to be enabled for v2 eventually. We are not there yet, but I
hope this patchset can move thing forward meaningfully.

Regards,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-17 Thread Tejun Heo
Hello, Waiman.

On Mon, May 15, 2017 at 09:34:10AM -0400, Waiman Long wrote:
> The current thread mode semantics aren't sufficient to fully support
> threaded controllers like cpu. The main problem is that when thread
> mode is enabled at root (mainly for performance reason), all the
> non-threaded controllers cannot be supported at all.
> 
> To alleviate this problem, the roles of thread root and threaded
> cgroups are now further separated. Now thread mode can only be enabled
> on a non-root leaf cgroup whose parent will then become the thread
> root. All the descendants of a threaded cgroup will still need to be
> threaded. All the non-threaded resource will be accounted for in the
> thread root. Unlike the previous thread mode, however, a thread root
> can have non-threaded children where system resources like memory
> can be further split down the hierarchy.
> 
> Now we could have something like
> 
>   R -- A -- B
>\
> T1 -- T2
> 
> where R is the thread root, A and B are non-threaded cgroups, T1 and
> T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree
> where all the non-threaded resources are accounted for in R.  The no
> internal process constraint does not apply in the threaded subtree.
> Non-threaded controllers need to properly handle the competition
> between internal processes and child cgroups at the thread root.
> 
> This model will be flexible enough to support the need of the threaded
> controllers.

I do like the approach and it does address the issue with requiring at
least one level of nesting for the thread mode to be used with other
controllers.  I need to think a bit more about it and mull over what
Peterz was suggesting in the old thread.  I'll get back to you soon
but I'd really prefer this and the earlier related patches to be in
its own patchset so that we aren't dealing with different things at
the same time.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

2017-05-15 Thread Waiman Long
The current thread mode semantics aren't sufficient to fully support
threaded controllers like cpu. The main problem is that when thread
mode is enabled at root (mainly for performance reason), all the
non-threaded controllers cannot be supported at all.

To alleviate this problem, the roles of thread root and threaded
cgroups are now further separated. Now thread mode can only be enabled
on a non-root leaf cgroup whose parent will then become the thread
root. All the descendants of a threaded cgroup will still need to be
threaded. All the non-threaded resource will be accounted for in the
thread root. Unlike the previous thread mode, however, a thread root
can have non-threaded children where system resources like memory
can be further split down the hierarchy.

Now we could have something like

R -- A -- B
 \
  T1 -- T2

where R is the thread root, A and B are non-threaded cgroups, T1 and
T2 are threaded cgroups. The cgroups R, T1, T2 form a threaded subtree
where all the non-threaded resources are accounted for in R.  The no
internal process constraint does not apply in the threaded subtree.
Non-threaded controllers need to properly handle the competition
between internal processes and child cgroups at the thread root.

This model will be flexible enough to support the need of the threaded
controllers.

Signed-off-by: Waiman Long 
---
 Documentation/cgroup-v2.txt |  51 +++
 kernel/cgroup/cgroup-internal.h |  10 +++
 kernel/cgroup/cgroup.c  | 186 +++-
 3 files changed, 209 insertions(+), 38 deletions(-)

diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
index 1c6f5a9..3ae7e9c 100644
--- a/Documentation/cgroup-v2.txt
+++ b/Documentation/cgroup-v2.txt
@@ -222,21 +222,32 @@ process can be put in different cgroups and are not 
subject to the no
 internal process constraint - threaded controllers can be enabled on
 non-leaf cgroups whether they have threads in them or not.
 
-To enable the thread mode, the following conditions must be met.
+To enable the thread mode on a cgroup, the following conditions must
+be met.
 
-- The thread root doesn't have any child cgroups.
+- The cgroup doesn't have any child cgroups.
 
-- The thread root doesn't have any controllers enabled.
+- The cgroup doesn't have any non-threaded controllers enabled.
+
+- The cgroup doesn't have any processes attached to it.
 
 Thread mode can be enabled by writing "enable" to "cgroup.threads"
 file.
 
   # echo enable > cgroup.threads
 
-Inside a threaded subtree, "cgroup.threads" can be read and contains
-the list of the thread IDs of all threads in the cgroup.  Except that
-the operations are per-thread instead of per-process, "cgroup.threads"
-has the same format and behaves the same way as "cgroup.procs".
+The parent of the threaded cgroup will become the thread root, if
+it hasn't been a thread root yet. In other word, thread mode cannot
+be enabled on the root cgroup as it doesn't have a parent cgroup. A
+thread root can have child cgroups and controllers enabled before
+becoming one.
+
+A threaded subtree includes the thread root and all the threaded child
+cgroups as well as their descendants which are all threaded cgroups.
+"cgroup.threads" can be read and contains the list of the thread
+IDs of all threads in the cgroup.  Except that the operations are
+per-thread instead of per-process, "cgroup.threads" has the same
+format and behaves the same way as "cgroup.procs".
 
 The thread root serves as the resource domain for the whole subtree,
 and, while the threads can be scattered across the subtree, all the
@@ -246,25 +257,30 @@ not readable in the subtree proper.  However, 
"cgroup.procs" can be
 written to from anywhere in the subtree to migrate all threads of the
 matching process to the cgroup.
 
-Only threaded controllers can be enabled in a threaded subtree.  When
-a threaded controller is enabled inside a threaded subtree, it only
-accounts for and controls resource consumptions associated with the
-threads in the cgroup and its descendants.  All consumptions which
-aren't tied to a specific thread belong to the thread root.
+Only threaded controllers can be enabled in a non-root threaded cgroup.
+When a threaded controller is enabled inside a threaded subtree,
+it only accounts for and controls resource consumptions associated
+with the threads in the cgroup and its descendants.  All consumptions
+which aren't tied to a specific thread belong to the thread root.
 
 Because a threaded subtree is exempt from no internal process
 constraint, a threaded controller must be able to handle competition
 between threads in a non-leaf cgroup and its child cgroups.  Each
 threaded controller defines how such competitions are handled.
 
+A new child cgroup created under a thread root will not be threaded.
+Thread mode has to be explicitly enabled on each of the thread root's
+children.  Descendants of a threaded cgroup, however, will a