subject:"Re\: \[ckrm\-tech\] \[PATCH 0\/2\] resource control file system \- aka containers on top of nsproxy\!"

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Srivatsa Vaddagiri

On Thu, Mar 22, 2007 at 09:39:09AM -0500, Serge E. Hallyn wrote:
> > What troubles will mounting both cpuset and ns in the same hierarchy
> > cause?
> 
> Wow, don't recall the full context here.

Sorry to have come back so late on this :)

> But at least with Paul's container patchset, a subsystem can only be mounted 
> once.  So if the nsproxy container subsystem is always mounted by itself, 
> then you cannot remount it to bind it with cpusets.

Sure ..I was thinking of mounting both ns and cpuset in same hierarchy
for first time itself.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Serge E. Hallyn

Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
> On Fri, Mar 09, 2007 at 10:50:17AM -0600, Serge E. Hallyn wrote:
> > The nsproxy container subsystem could be said to be that unification.
> > If we really wanted to I suppose we could now always mount the nsproxy
> > subsystem, get rid of tsk->nsproxy, and always get thta through it's
> > nsproxy subsystem container.  But then that causes trouble with being
> > able to mount a hierarachy like
> > 
> > mount -t container -o ns,cpuset
> 
> What troubles will mounting both cpuset and ns in the same hierarchy
> cause?

Wow, don't recall the full context here.  But at least with Paul's
container patchset, a subsystem can only be mounted once.  So if the
nsproxy container subsystem is always mounted by itself, then you cannot
remount it to bind it with cpusets.

> IMO that may be a good feature by itself, which makes it convenient to 
> bind different containers to different cpusets.

Absolutely.

-serge

> In this case, we want 'ns' subsystem to override all decisions wrt
> mkdir of directories and also movement of tasks b/n different
> groups. This is automatically accomplished in the patches, by having ns
> subsystem veto mkdir/can_attach request which aren't allowed as per
> namespace semantics (but which may be allowed as per cpuset semantics).
> 
> > so we'd have to fix something.  It also slows things down...
> 
> -- 
> Regards,
> vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Srivatsa Vaddagiri

On Fri, Mar 09, 2007 at 10:50:17AM -0600, Serge E. Hallyn wrote:
> The nsproxy container subsystem could be said to be that unification.
> If we really wanted to I suppose we could now always mount the nsproxy
> subsystem, get rid of tsk->nsproxy, and always get thta through it's
> nsproxy subsystem container.  But then that causes trouble with being
> able to mount a hierarachy like
> 
>   mount -t container -o ns,cpuset

What troubles will mounting both cpuset and ns in the same hierarchy
cause? IMO that may be a good feature by itself, which makes it convenient to 
bind different containers to different cpusets.

In this case, we want 'ns' subsystem to override all decisions wrt
mkdir of directories and also movement of tasks b/n different
groups. This is automatically accomplished in the patches, by having ns
subsystem veto mkdir/can_attach request which aren't allowed as per
namespace semantics (but which may be allowed as per cpuset semantics).

> so we'd have to fix something.  It also slows things down...

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-13 Thread Herbert Poetzl

On Mon, Mar 12, 2007 at 07:25:48PM -0700, Paul Menage wrote:
> On 3/12/07, Herbert Poetzl <[EMAIL PROTECTED]> wrote:
> >
> > why? you simply enter that specific space and
> > use the existing mechanisms (netlink, proc, whatever)
> > to retrieve the information with _existing_ tools,
> 
> That's assuming that you're using network namespace virtualization,

or isolation :)

> with each group of tasks in a separate namespace. 

correct ...

> What if you don't want the virtualization overhead, just the
> accounting?

there should be no 'virtualization' overhead, and what
do you want to account for, if not by a group of tasks?

maybe I'm missing the grouping condition here, but I
assume you assign tasks to the accounting containers

note: network isolation is not supposed to add overhead
compared to the host system (at least not measureable
overhead)

best,
Herbert

> Paul
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Menage


On 3/12/07, Herbert Poetzl <[EMAIL PROTECTED]> wrote:


why? you simply enter that specific space and
use the existing mechanisms (netlink, proc, whatever)
to retrieve the information with _existing_ tools,


That's assuming that you're using network namespace virtualization,
with each group of tasks in a separate namespace. What if you don't
want the virtualization overhead, just the accounting?

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Tue, Mar 13, 2007 at 12:31:13AM +0100, Herbert Poetzl wrote:
> just means that the current Linux-VServer behaviour
> is a subset of that, no problem there as long as
> it really _is_ a subset :) we always like to provide
> more features in the future, no problem with that :)

Considering the example Sam quoted, doesn't it make sense to split
resource classes (some of them atleast) independent of each other?
That would also argue for providing multiple hierarchy feature in Paul's
patches.

Given that and the mail Serge sent on why nsproxy optimization is
usefull given numbers, can you reconsider your earlier proposals as
below:

- pid_ns and resource parameters should be in a single struct
  (point 1c, 7b in [1])

- pointers to resource controlling objects should be inserted
  in task_struct directly (instead of nsproxy indirection)
  (points 2c in [1])

[1] http://lkml.org/lkml/2007/3/12/138

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Herbert Poetzl

On Mon, Mar 12, 2007 at 09:50:45PM +0530, Srivatsa Vaddagiri wrote:
> On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
> > What's wrong with that?
> 
> I had been asking around on "what is the fundamental unit of res mgmt
> for vservers" and the answer I got (from Herbert) was "all tasks that are
> in the same pid namespace". From what you are saying above, it seems to
> be that there is no such "fundamental" unit. It can be a random mixture
> of tasks (taken across vservers) whose resource consumption needs to be
> controlled. Is that correct?

just means that the current Linux-VServer behaviour
is a subset of that, no problem there as long as
it really _is_ a subset :) we always like to provide
more features in the future, no problem with that :)

best,
Herbert

> > >   echo "cid 2" > /dev/cpu/prof/tasks 
> > 
> > Adding that feature sounds fine, 
> 
> Ok yes ..that can be a optional feature.
> 
> -- 
> Regards,
> vatsa
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Herbert Poetzl

On Mon, Mar 12, 2007 at 03:00:25AM -0700, Paul Menage wrote:
> On 3/11/07, Paul Jackson <[EMAIL PROTECTED]> wrote:
> >
> > My current understanding of Paul Menage's container patch is that it is
> > a useful improvement for some of the metered classes - those that could
> > make good use of a file system like hierarchy for their interface.
> > It probably doesn't benefit all metered classes, as they won't all
> > benefit from a file system like hierarchy, or even have a formal name
> > space, and it doesn't seem to benefit the name space implementation,
> > which happily remains flat.
> 
> Well, what I was aiming at was a generic mechanism that can handle
> "namespaces", "metered classes" and other ways of providing
> per-task-group behaviour. So a system-call API doesn't necessarily
> have the right flexibility to implement the possible different kinds
> of subsystems I envisage.
> 
> For example, one way to easily tie groups of processes to different
> network queues is to have a tag associated with a container, allow
> that to propagate to the socket/skbuf priority field, and then use
> standard Linux traffic control to pick the appropriate outgoing queue
> based on the skbuf's tag.
> 
> This isn't really a namespace, and it isn't really a "metered class".
> It's just a way of associating a piece of data (the network tag) with
> a group of processes.
> 
> With a filesystem-based interface, it's easy to have a file as the
> method of reading/writing the tag; with a system call interface, then
> either the interface is sufficiently generic to allow this kind of
> data association (in which case you're sort of implementing a
> filesystem in the system call) or else you have to shoehorn into an
> unrelated API (e.g. if your system call talks about "resource limits"
> you might end up having to specify the network tag as a "maximum
> limit" since there's no other useful configuration data available).
> 
> As another example, I'd like to have a subsystem that shows me all the
> sockets that processes in the container have opened; again, easy to do
> in a filesystem interface, but hard to fit into a
> resource-meteting-centric or namespace-centric system call API.

why? you simply enter that specific space and
use the existing mechanisms (netlink, proc, whatever)
to retrieve the information with _existing_ tools,
no need to do anything unusual via the syscall API

and if you support a 'spectator' context or capability
(which allows to see the _whole_ truth) you can also
get this information for _all_ sockets with existing
tools like netstat or lsof ...

best,
Herbert

> Paul
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Sam Vilain

Srivatsa Vaddagiri wrote:
> On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
>   
>> What's wrong with that?
>> 
>
> I had been asking around on "what is the fundamental unit of res mgmt
> for vservers" and the answer I got (from Herbert) was "all tasks that are
> in the same pid namespace". From what you are saying above, it seems to
> be that there is no such "fundamental" unit. It can be a random mixture
> of tasks (taken across vservers) whose resource consumption needs to be
> controlled. Is that correct?
>   

Sure, for instance, all postgres processes across all servers might be
put in a different IO and buffercache use container by the system
administrator.

Sam.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Jackson

vatsa wrote:
> This assumes that you can see the global vfs namespace right?
> 
> What if you are inside a container/vserver which restricts your vfs
> namespace? i.e /dev/cpusets seen from one container is not same as what
> is seen from another container .

Well, yes.  But that restriction on the namespace is no doing of
cpusets.

It's some vfs namespace restriction, which should be an orthogonal
mechanism.

Well, it's probably not orthogonal at present.  Cpusets might not yet
handle a restricted vfs name space very well.

For example the /proc//cpuset path, giving path below /dev/cpuset
of task pid's cpuset, might not be restricted.  And the set of all CPUs
and Memory Nodes that are online, which is visible in various /proc
files, and also visible in ones top cpuset, might be inconsistent if
restricted vfs namespace mapped you to a different top cpuset.

There are probably other loose ends as well.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Serge E. Hallyn

Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
> On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
> > What's wrong with that?
> 
> I had been asking around on "what is the fundamental unit of res mgmt
> for vservers" and the answer I got (from Herbert) was "all tasks that are
> in the same pid namespace". From what you are saying above, it seems to
> be that there is no such "fundamental" unit. It can be a random mixture
> of tasks (taken across vservers) whose resource consumption needs to be
> controlled. Is that correct?

If I'm reading it right, yes.

If for vservers the fundamental unit of res mgmt is a vserver, that can
surely be done at a higher level than in the kernel.

Actually, these could be tied just by doing

mount -t container -o ns,cpuset /containers

So now any task in /containers/vserver1 or any subdirectory thereof
would have the same cpuset constraints as /containers.  OTOH, you could
mount them separately

mount -t container -o ns /nsproxy
mount -t container -o cpuset /cpuset

and now you have the freedom to split tasks in the same vserver
(under /nsproxy/vserver1) into different cpusets.

-serge

> > >   echo "cid 2" > /dev/cpu/prof/tasks 
> > 
> > Adding that feature sounds fine, 
> 
> Ok yes ..that can be a optional feature.
> 
> -- 
> Regards,
> vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
> What's wrong with that?

I had been asking around on "what is the fundamental unit of res mgmt
for vservers" and the answer I got (from Herbert) was "all tasks that are
in the same pid namespace". From what you are saying above, it seems to
be that there is no such "fundamental" unit. It can be a random mixture
of tasks (taken across vservers) whose resource consumption needs to be
controlled. Is that correct?

> > echo "cid 2" > /dev/cpu/prof/tasks 
> 
> Adding that feature sounds fine, 

Ok yes ..that can be a optional feature.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Serge E. Hallyn

Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
> On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
> > > 3. This next leads me to think that 'tasks' file in each directory doesnt 
> > > make
> > >sense for containers. In fact it can lend itself to error situations 
> > > (by
> > >administrator/script mistake) when some tasks of a container are in one
> > >resource class while others are in a different class.
> > >
> > > Instead, from a containers pov, it may be usefull to write
> > > a 'container id' (if such a thing exists) into the tasks file
> > > which will move all the tasks of the container into
> > > the new resource class. This is the same requirement we
> > > discussed long back of moving all threads of a process into new
> > > resource class.
> > 
> > I think you need to give a more concrete example and use case of what
> > you're trying to propose here. I don't really see what advantage
> > you're getting.
> 
> Ok, this is what I had in mind:
> 
> 
>   mount -t container -o ns /dev/namespace
>   mount -t container -o cpu /dev/cpu
> 
> Lets we have the namespaces/resource-groups created as under:
> 
>   /dev/namespace
>   |-- prof
>   ||- tasks <- (T1, T2)
>   ||- container_id <- 1 (doesnt exist today perhaps)
>   |
>   |-- student
>   ||- tasks <- (T3, T4)
>   ||- container_id <- 2 (doesnt exist today perhaps)
> 
>   /dev/cpu
>  |-- prof
>  ||-- tasks
>  ||-- cpu_limit (40%)
>  |
>  |-- student
>  ||-- tasks
>  ||-- cpu_limit (20%)
>  |
>  |
> 
> 
> Is it possible to create the above structure in container patches? 
> /me thinks so.
> 
> If so, then accidentally someone can do this:
> 
>   echo T1 > /dev/cpu/prof/tasks
>   echo T2 > /dev/cpu/student/tasks
> 
> with the result that tasks of the same container are now in different
> resource classes.

What's wrong with that?

> Thats why in case of containers I felt we shldnt allow individual tasks
> to be cat'ed to tasks file. 
> 
> Or rather, it may be nice to say :
> 
>   echo "cid 2" > /dev/cpu/prof/tasks 
> 
> and have all tasks belonging to container id 2 move to the new resource
> group.

Adding that feature sounds fine, but don't go stopping me from putting
T1 into /dev/cpu/prof/tasks and T2 into /dev/cpu/student/tasks just
because you have your own notion of what each task is supposed to be.
Just because they're in the same namespaces doesn't mean they should get
the same resource allocations.  If you want to add that kind of policy,
well, it should be policy - user-definable.

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Mon, Mar 12, 2007 at 07:31:48PM +0530, Srivatsa Vaddagiri wrote:
> not so. This in-fact lets vservers and containers to work with each
> other. So:

s/containers/cpusets

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
> > 3. This next leads me to think that 'tasks' file in each directory doesnt 
> > make
> >sense for containers. In fact it can lend itself to error situations (by
> >administrator/script mistake) when some tasks of a container are in one
> >resource class while others are in a different class.
> >
> > Instead, from a containers pov, it may be usefull to write
> > a 'container id' (if such a thing exists) into the tasks file
> > which will move all the tasks of the container into
> > the new resource class. This is the same requirement we
> > discussed long back of moving all threads of a process into new
> > resource class.
> 
> I think you need to give a more concrete example and use case of what
> you're trying to propose here. I don't really see what advantage
> you're getting.

Ok, this is what I had in mind:


mount -t container -o ns /dev/namespace
mount -t container -o cpu /dev/cpu

Lets we have the namespaces/resource-groups created as under:

/dev/namespace
|-- prof
||- tasks <- (T1, T2)
||- container_id <- 1 (doesnt exist today perhaps)
|
|-- student
||- tasks <- (T3, T4)
||- container_id <- 2 (doesnt exist today perhaps)

/dev/cpu
   |-- prof
   ||-- tasks
   ||-- cpu_limit (40%)
   |
   |-- student
   ||-- tasks
   ||-- cpu_limit (20%)
   |
   |


Is it possible to create the above structure in container patches? 
/me thinks so.

If so, then accidentally someone can do this:

echo T1 > /dev/cpu/prof/tasks
echo T2 > /dev/cpu/student/tasks

with the result that tasks of the same container are now in different
resource classes.

Thats why in case of containers I felt we shldnt allow individual tasks
to be cat'ed to tasks file. 

Or rather, it may be nice to say :

echo "cid 2" > /dev/cpu/prof/tasks 

and have all tasks belonging to container id 2 move to the new resource
group.



-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Wed, Mar 07, 2007 at 03:59:19PM -0600, Serge E. Hallyn wrote:
> > containers patches uses just a single pointer in the task_struct, and
> > all tasks in the same set of containers (across all hierarchies) will
> > share a single container_group object, which holds the actual pointers
> > to container state.
> 
> Yes, that's why this consolidation doesn't make sense to me.
> 
> Especially considering again that we will now have nsproxies pointing to
> containers pointing to... nsproxies.

nsproxies needn't point to containers. It (or as Herbert pointed -
nsproxy->pid_ns) can have direct pointers to resource objects (whatever
struct container->subsys[] points to).


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri

On Fri, Mar 09, 2007 at 02:06:03PM -0800, Paul Jackson wrote:
> >  if you create a 'resource container' to limit the
> >  usage of a set of resources for the processes
> >  belonging to this container, it would be kind of
> >  defeating the purpose, if you'd allow the processes
> >  to manipulate their limits, no?
> 
> Wrong - this is not the only way.
> 
> For instance in cpusets, -any- task in the system, regardless of what
> cpuset it is currently assigned to, might be able to manipulate -any-
> cpuset in the system.
> 
> Yes -- some sufficient mechanism is required to keep tasks from
> escalating their resources or capabilities beyond an allowed point.
> 
> But that mechanism might not be strictly based on position in some
> hierarchy.
> 
> In the case of cpusets, it is based on the permissions on files in
> the cpuset file system (normally mounted at /dev/cpuset), versus
> the current priviledges and capabilities of the task.
> 
> A root priviledged task in the smallest leaf node cpuset can manipulate
> every cpuset in the system.  This is an ordinary and common occurrence.

This assumes that you can see the global vfs namespace right?

What if you are inside a container/vserver which restricts your vfs
namespace? i.e /dev/cpusets seen from one container is not same as what
is seen from another container ..Is that a unrealistic scenario? IMHO
not so. This in-fact lets vservers and containers to work with each
other. So:

/dev/cpuset
|- C1   <- Container A bound to this
|  |- C11
|  |- C12
|
|- C2   <- Container B bound to this
   |- C21
   |- C22


C1 and C2 are two exclusive cpusets and containers/vservers A and B are bound 
to C1/C2 respectively.

>From inside container/vserver A, if you were to look at /dev/cpuset, it will
-appear- as if you are in the top cpuset (with just C11 and C12 child
cpusets). It cannot modify C2 at all (since it has no visibility).

Similarly if you were to look at /dev/cpuset from inside B, it will list only 
C21/C22 with tasks in container B not being able to see C1 at all.

:)


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Menage


On 3/11/07, Paul Jackson <[EMAIL PROTECTED]> wrote:


My current understanding of Paul Menage's container patch is that it is
a useful improvement for some of the metered classes - those that could
make good use of a file system like hierarchy for their interface.
It probably doesn't benefit all metered classes, as they won't all
benefit from a file system like hierarchy, or even have a formal name
space, and it doesn't seem to benefit the name space implementation,
which happily remains flat.


Well, what I was aiming at was a generic mechanism that can handle
"namespaces", "metered classes" and other ways of providing
per-task-group behaviour. So a system-call API doesn't necessarily
have the right flexibility to implement the possible different kinds
of subsystems I envisage.

For example, one way to easily tie groups of processes to different
network queues is to have a tag associated with a container, allow
that to propagate to the socket/skbuf priority field, and then use
standard Linux traffic control to pick the appropriate outgoing queue
based on the skbuf's tag.

This isn't really a namespace, and it isn't really a "metered class".
It's just a way of associating a piece of data (the network tag) with
a group of processes.

With a filesystem-based interface, it's easy to have a file as the
method of reading/writing the tag; with a system call interface, then
either the interface is sufficiently generic to allow this kind of
data association (in which case you're sort of implementing a
filesystem in the system call) or else you have to shoehorn into an
unrelated API (e.g. if your system call talks about "resource limits"
you might end up having to specify the network tag as a "maximum
limit" since there's no other useful configuration data available).

As another example, I'd like to have a subsystem that shows me all the
sockets that processes in the container have opened; again, easy to do
in a filesystem interface, but hard to fit into a
resource-meteting-centric or namespace-centric system call API.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Sam Vilain

Allow me to annotate your nice summary. A lot of this is elaborating on
what you are saying; and I think where we disagree, the differences are
not important.

Paul Jackson wrote:
> We have actors, known as threads, tasks or processes, which use things,
> which are instances of such classes of things as disk partitions,
> file systems, memory, cpus, and semaphores.
>
> We assign names to these things, such as SysV id's to the semaphores,
> mount points to the file systems, pathnames to files and file
> descriptors to open files.  These names provide handles that
> are typically more convenient and efficient to use, but alas less
> persistent, less ubiquitous, and needing of some dereferencing when
> used, to identify the underlying thing.
>
> Any particular assignment of names to some of the things in particular
> class forms one namespace (aka 'space', above).  For each class of
> things, a given task is assigned one such namespace.  Typically many
> related tasks (such as all those of a login session or a job) will be
> assigned the same set of namespaces, leading to various opportunities
> for optimizing the management of namespaces in the kernel.
>
> This assignment of names to things is neither injective nor surjective
> nor even a complete map.
>
> For example, not all file systems are mounted, certainly not all
> possible mount points (all directories) serve as mount points,
> sometimes the same file system is mounted in multiple places, and
> sometimes more than one file system is mounted on the same mount point,
> one hiding the other.
>   

Right, which is why I preferred the term "mount space" or "mount
namespace". The keys in the map are not as important as the presence of
the independent map itself.

Unadorned "namespaces" is currently how they are known, and short of
becoming the Hurd I don't think this term is appropriate for Linux.

> In so far as the code managing this naming is concerned, the names are
> usually fairly arbitrary, except that there seems to be a tendency
> toward properly virtualizing these namespaces, presenting to a task
> the namespaces assigned it as if that was all there was, hiding the
> presence of alternative namespaces, and intentionally not providing a
> 'global view' that encompasses all namespaces of a given class.
>
> This tendency culminates in the full blown virtual machines, such as
> Xen and KVM, which virtualize more or less all namespaces.
>   

Yes, these systems, somewhat akin to microkernels, virtualize all
namespaces as a byproduct of their nature.

> Because the essential semantics relating one namespace to another are
> rather weak (the namespaces for any given class of things are or can
> be pretty much independent of each other), there is a preference and
> a tradition to keep such sets of namespaces a simple flat space.
>   

This has been the practice to date with most worked implementations,
with the proviso that as the feature becomes standard people may start
expecting spaces within spaces to work indistinguishably from top level
spaces; in fact, perhaps there should be no such distinction between a
"top" space and a subservient space, other than the higher level space
is aware of the subservient space.

Consider, for instance, that BIND already uses Linux kernel features
which are normally only attributed to the top space, such as adjusting
ulimits and unsetting capability bits. This kind of application
self-containment may become more commonplace.

And this perhaps begs the question: is it worth the performance penalty,
or must there be one?

> Conclusions regarding namespaces, aka spaces:
>
> A namespace provide a set of convenient handles for things of a
> particular class.
>
> For each class of things, every task gets one namespace (perhaps
> a Null or Default one.)
>   

Conceptually, every task exists in exactly one space of each type,
though that space may see and/or administer other spaces.

> Namespaces are partial virtualizations, the 'space of namespaces'
> is pretty flat, and the assignment of names in one namespace is
> pretty independent of the next.
>
> ===
>
> That much covers what I understand (perhaps in error) of namespaces.
>
> So what's this resource accounting/limits stuff?
>
> I think this depends on adding one more category to our universe.
>
> For the purposes of introducing yet more terms, I will call this
> new category a "metered class."
>   

The term "metered" implies "a resource which renews over time". How does
this apply to a fixed limit? A limit's nominal unit may not be delimited
in terms of time, but it must be continually maintained, so it can be
"metered" in terms of use of that limit over time.

For instance, a single system in scheduling terms is limited to the use
of the number of CPUs present in the system. So, while it has a "limit"
of 2 CPUs, in terms of a metered resource, it has a maximum rate of 2
CPU seconds per second.

> Each time we set about to manage some resou

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-11 Thread Paul Jackson

Sam, responding to Herbert:
> > from my personal PoV the following would be fine:
> >
> >  spaces (for the various 'spaces')
> >...
> >  container (for resource accounting/limits)
> >...
> 
> I like these a lot ...

Hmmm ... ok ...

Let me see if I understand this.

We have actors, known as threads, tasks or processes, which use things,
which are instances of such classes of things as disk partitions,
file systems, memory, cpus, and semaphores.

We assign names to these things, such as SysV id's to the semaphores,
mount points to the file systems, pathnames to files and file
descriptors to open files.  These names provide handles that
are typically more convenient and efficient to use, but alas less
persistent, less ubiquitous, and needing of some dereferencing when
used, to identify the underlying thing.

Any particular assignment of names to some of the things in particular
class forms one namespace (aka 'space', above).  For each class of
things, a given task is assigned one such namespace.  Typically many
related tasks (such as all those of a login session or a job) will be
assigned the same set of namespaces, leading to various opportunities
for optimizing the management of namespaces in the kernel.

This assignment of names to things is neither injective nor surjective
nor even a complete map.

For example, not all file systems are mounted, certainly not all
possible mount points (all directories) serve as mount points,
sometimes the same file system is mounted in multiple places, and
sometimes more than one file system is mounted on the same mount point,
one hiding the other.

In so far as the code managing this naming is concerned, the names are
usually fairly arbitrary, except that there seems to be a tendency
toward properly virtualizing these namespaces, presenting to a task
the namespaces assigned it as if that was all there was, hiding the
presence of alternative namespaces, and intentionally not providing a
'global view' that encompasses all namespaces of a given class.

This tendency culminates in the full blown virtual machines, such as
Xen and KVM, which virtualize more or less all namespaces.

Because the essential semantics relating one namespace to another are
rather weak (the namespaces for any given class of things are or can
be pretty much independent of each other), there is a preference and
a tradition to keep such sets of namespaces a simple flat space.

Conclusions regarding namespaces, aka spaces:

A namespace provide a set of convenient handles for things of a
particular class.

For each class of things, every task gets one namespace (perhaps
a Null or Default one.)

Namespaces are partial virtualizations, the 'space of namespaces'
is pretty flat, and the assignment of names in one namespace is
pretty independent of the next.

===

That much covers what I understand (perhaps in error) of namespaces.

So what's this resource accounting/limits stuff?

I think this depends on adding one more category to our universe.

For the purposes of introducing yet more terms, I will call this
new category a "metered class."

Each time we set about to manage some resource, we tend to construct
some more elaborate "metered classes" out of the elemental classes
of things (partitions, cpus, ...) listed above.

Examples of these more elaborate metered classes include percentages
of a networks bandwidth, fractions of a nodes memory (the fake numa
patch), subsets of the systems cpus and nodes (cpusets), ...

These more elaborate metered classes each have fairly 'interesting'
and specialized forms.  Their semantics are closely adapted to the
underlying class of things from which they are formed, and to the
usually challenging, often conflicting, constraints on managing the
usage of such a resource.

For example, the rules that apply to percentages of a networks
bandwidth have little in common with the rules that apply to sets of
subsets of a systems cpus and nodes.

We then attach tasks to these metered classes.  Each task is assigned
one metered instance from each metered class.  For example, each task
is assigned to a cpuset.

For metered classes that are visible across the system, we tend
to name these classes, and then use those names when attaching
tasks to them.  See for example cpusets.

For metered classes that are only privately visible within the
current context of a task, such as setrlimit, set_mempolicy,
mbind and set_mempolicy, we tend to implicitly attach each task
to its current metered class and provide it explicit means
to manipulate the individual attributes of that metered class
by direct system calls.

Conclusions regarding metered classes, aka containers:

Unlike namespaces, metered classes have rich and varied semantics,
sometimes elaborate inheritance and transfer rules, and frequently
non-flat topologies.

Depending on the scope of visibility of a metered class, it may
or may not have much of a formal name space.

==

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-10 Thread Sam Vilain

Herbert Poetzl wrote:
> On Wed, Mar 07, 2007 at 11:44:58PM -0700, Eric W. Biederman wrote:
>   
>> I really don't much care as long as we don't start redefining
>> container as something else.  I think the IBM guys took it from
>> solaris originally which seems to define a zone as a set of
>> isolated processes (for us all separate namespaces).  And a container
>> as a set of as a zone that uses resource control.  Not exactly how
>> we have been using the term but close enough not to confuse someone.
>>
>> As long as we don't go calling the individual subsystems or the
>> process groups they need to function a container I really don't care.
>> [...]
>> Resource groups at least for subset of subsystems that aren't
>> namespaces sounds reasonable.  Heck resource group, resource
>> controller, resource subsystem, resource just about anything seems
>> sane to me.
>>
>> The important part is that we find a vocabulary without doubly
>> defined words so we can communicate and a small common set we can
>> agree on so people can work on and implement the individual
>> resource controllers/groups, and get the individual pieces merged
>> as they are reading.
>> 
>
> from my personal PoV the following would be fine:
>
>  spaces (for the various 'spaces')
>
>   - similar enough to the old namespace
>   - can be easily used with prefix/postfix
> like in pid_space, mnt_space, uts_space etc
>   - AFAIK, it is not used yet for anything else
>
>  container (for resource accounting/limits)
>
>   - has the 'containment' principle built in :)
>   - is used in similar ways in other solutions
>   - sounds similar to context (easy to associate)
>
> note: I'm also fine with other names, as long as
> we find some useable vocabulary soon, [...]

I like these a lot, particularly in that "mount space" could be a
reasonable replacement for "namespace".

As a result of this discussion, I see the sense in Paul Menage's
original choice of term.

There's just one problem.  We'd have to rename the mailing list to
"spaces and containers" :-)

Sam.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri

On Sat, Mar 10, 2007 at 07:32:20AM +0530, Srivatsa Vaddagiri wrote:
> Ok, let me see if I can convey what I had in mind better:
> 
>   uts_ns pid_ns ipc_ns
>   \|/
>   ---
>  | nsproxy|
>   
>  /  |   \\ <-- 'nsproxy' pointer
>   T1  T2  T3 ...T1000
>   |   |   |  | <-- 'containers' pointer (4/8 KB for 1000 task)
>  ---
> | container_group   |
>  --   
>   /
>--
>   | container |
>--
>   |
>--
>   | cpu_limit |
>-- 

[snip]

> We save on 4/8 KB (for 1000 tasks) by avoiding the 'containers' pointer
> in each task_struct (just to get to the resource limit information).

Having the 'containers' pointer in each task-struct is great from a
non-container res mgmt perspective. It lets you dynamically decide what
is the fundamental unit of res mgmt. 

It could be {T1, T5} tasks/threads of a process, or {T1, T3, T8, T10} tasks of 
a session (for limiting login time per session), or {T1, T2 ..T10, T18, T27} 
tasks of a user etc.

But from a vserver/container pov, this level flexibility (at a -task- level) of 
deciding the unit of res mgmt is IMHO not needed. The
vserver/container/namespace (tsk->nsproxy->some_ns) to which a task 
belongs automatically defines that unit of res mgmt.


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Paul Jackson

> the emphasis here is on 'from inside' which basically
> boils down to the following:
> 
>  if you create a 'resource container' to limit the
>  usage of a set of resources for the processes
>  belonging to this container, it would be kind of
>  defeating the purpose, if you'd allow the processes
>  to manipulate their limits, no?

Wrong - this is not the only way.

For instance in cpusets, -any- task in the system, regardless of what
cpuset it is currently assigned to, might be able to manipulate -any-
cpuset in the system.

Yes -- some sufficient mechanism is required to keep tasks from
escalating their resources or capabilities beyond an allowed point.

But that mechanism might not be strictly based on position in some
hierarchy.

In the case of cpusets, it is based on the permissions on files in
the cpuset file system (normally mounted at /dev/cpuset), versus
the current priviledges and capabilities of the task.

A root priviledged task in the smallest leaf node cpuset can manipulate
every cpuset in the system.  This is an ordinary and common occurrence.

I say again, as you seem to be skipping over this detail, one
advantage of basing an API on a file system is the usefulness of
the file system permission model (the -rwxrwxrwx permissions and
the uid/gid owners on each file and directory node).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Herbert Poetzl

On Fri, Mar 09, 2007 at 11:49:08PM +0530, Srivatsa Vaddagiri wrote:
> On Fri, Mar 09, 2007 at 01:53:57AM +0100, Herbert Poetzl wrote:
>>> The real trick is that I believe these groupings are designed to
>>> be something you can setup on login and then not be able to switch
>>> out of. Which means we can't use sessions and process groups as the
>>> grouping entities as those have different semantics.
>> 
>> precisely, once you are inside a resource container, you
>> must not have the ability to modify its limits, and to
>> some degree, you should not know about the actual available
>> resources, but only about the artificial limits

the emphasis here is on 'from inside' which basically
boils down to the following:

 if you create a 'resource container' to limit the
 usage of a set of resources for the processes
 belonging to this container, it would be kind of
 defeating the purpose, if you'd allow the processes
 to manipulate their limits, no?

> From non-container workload management perspective, we do desire
> dynamic manipulation of limits associated with a group and also the
> ability to move tasks across resource-classes/groups.

the above doesn't mean that there aren't processes
_outside_ the resource container which have the
necessary capabilities to manipulate the container
in any way (changing limits dynamically, moving
tasks in and out of the container, etc ...)

best,
Herbert

> -- 
> Regards,
> vatsa
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Paul Jackson

Herbert wrote (and vatsa quoted):
> precisely, once you are inside a resource container, you
> must not have the ability to modify its limits, and to
> some degree, you should not know about the actual available
> resources, but only about the artificial limits

Not necessarily.  Depending on the resource we are managing, and on how
all encompassing one chooses to make the virtualization, this might be
an overly Draconian permission model.

Certainly in cpusets, any task can see the the whole system view, if
there are fairly typical permissions on some /dev/cpuset files.  A task
might even be able to change those limits for any task on the system,
if it has stronger priviledge.

Whether or not to really virtualize something, so that the contained
task can't tell it is in side a cacoon, is a somewhat separate question
from whether or not to limit a tasks use of a particular precious
resource.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri

On Fri, Mar 09, 2007 at 01:53:57AM +0100, Herbert Poetzl wrote:
> > The real trick is that I believe these groupings are designed to
> > be something you can setup on login and then not be able to switch
> > out of. Which means we can't use sessions and process groups as the
> > grouping entities as those have different semantics.
> 
> precisely, once you are inside a resource container, you
> must not have the ability to modify its limits, and to
> some degree, you should not know about the actual available
> resources, but only about the artificial limits

>From non-container workload management perspective, we do desire dynamic
manipulation of limits associated with a group and also the ability to move 
tasks across resource-classes/groups.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Serge E. Hallyn

Quoting Paul Menage ([EMAIL PROTECTED]):
> On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:
> >
> >Ok, they share this characteristic with namespaces: that they group
> >processes.

Namespaces have a side effect of grouping processes, but a namespace is
not defined by 'grouping proceses.'  A container is, in fact, a group of
processes.

> > So, they conceptually hang off task_struct.  But we put them
> >on ns_proxy because we've got this vague notion that things might be
> >better that way.
> 
> Remember that I'm not the one pushing to move them into ns_proxy.
> These patches are all Srivatsa's work. Despite that fact that they say
> "Signed-off-by: Paul Menage", I'd never seen them before they were
> posted to LKML, and I'm not sure that they're the right approach.
> (Although some form of unification might be good).

The nsproxy container subsystem could be said to be that unification.
If we really wanted to I suppose we could now always mount the nsproxy
subsystem, get rid of tsk->nsproxy, and always get thta through it's
nsproxy subsystem container.  But then that causes trouble with being
able to mount a hierarachy like

mount -t container -o ns,cpuset

so we'd have to fix something.  It also slows things down...

> >>> about this you still insist on calling this sub-system specific stuff
> >>> the "container",
> >>>
> >> Uh, no. I'm trying to call a *grouping* of processes a container.
> >>
> >
> >Ok, so is this going to supplant the namespaces too?
> 
> I don't know. It would be nice to have a single object hanging off the
> task struct that contains all the various grouping pointers. Having

The namespaces aren't grouping pointers, they are resource id tables.

I stand by my earlier observation that placing namespace pointers and
grouping pointers in the same structure means that pointer will end up
pointing to itself.

> something that was flexible enough to handle all the required
> behaviours, or else allowing completely different behaviours for
> different subsets of that structure, could be the fiddly bit.
> 
> See my expanded reply to Eric' earlier post for a possible way of
> unifying them, and simplifying the nsproxy and container.c code in the
> process.

Doesn't ring a bell, I'll have to look around for that...

> >
> >  - resource groups (I get a strange feeling of d?j? v? there)
> 
> Resource groups isn't a terrible name for them (although I'd be

I still like 'rug' for resource usage groups :)

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri

On Fri, Mar 09, 2007 at 10:04:30PM +0530, Srivatsa Vaddagiri wrote:
> 2. Regarding space savings, if 100 tasks are in a container (I dont know
>what is a typical number) -and- lets say that all tasks are to share
>the same resource allocation (which seems to be natural), then having
>a 'struct container_group *' pointer in each task_struct seems to be not 
>very efficient (simply because we dont need that task-level granularity of
>managing resource allocation).

Note that this 'struct container_group *' pointer is in addition to the
'struct nsproxy *' pointer already in task_struct. If the set of tasks
over which resorce control is applied is typically the same set of tasks
which share the same 'struct nsproxy *' pointer, then IMHO 'struct
container_group *' in each task_struct is not very optimal.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson

Matt wrote:
> It's like that Star Trek episode ... except we can't agree on the name

Usually, when there is this much heat and smoke over a name, there is
really an underlying disagreement or misunderstanding over the meaning
of something.

The name becomes the proxy for meaning ;).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson

> The real trick is that I believe these groupings are designed to be something
> you can setup on login and then not be able to switch out of.  Which means
> we can't use sessions and process groups as the grouping entities as those 
> have different semantics.

Not always on login.  For big administered systems, we use batch schedulers
to manage the placement of multiple jobs, submitted to a run queue by users,
onto the available compute resources.

But I agree with your conclusion - the existing task grouping mechanisms,
while useful for some purposes, don't meet the need here.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl

On Wed, Mar 07, 2007 at 11:44:58PM -0700, Eric W. Biederman wrote:
> Matt Helsley <[EMAIL PROTECTED]> writes:
> 
> > On Thu, 2007-03-08 at 16:32 +-1300, Sam Vilain wrote:
> >
> > +ADw-snip+AD4
> >
> > +AD4 Kirill, 06032418:36+-03:
> > +AD4 +AD4 I propose to use +ACI-namespace+ACI naming.
> > +AD4 +AD4 1. This is already used in fs.
> > +AD4 +AD4 2. This is what IMHO suites at least OpenVZ/Eric
> > +AD4 +AD4 3. it has good acronym +ACI-ns+ACI.
> > +AD4 
> > +AD4 Right.  So, now I'll also throw into the mix:
> > +AD4 
> > +AD4   - resource groups (I get a strange feeling of d+AOk-j+AOA v+APo 
> > there)
> >
> > +ADw-offtopic+AD4
> > Re: d+AOk-j+AOA v+APo: yes+ACE
> >
> > It's like that Star Trek episode ... except we can't agree on the name
> > of the impossible particle we will invent which solves all our problems.
> > +ADw-/offtopic+AD4
> >
> > At the risk of prolonging the agony I hate to ask: are all of these
> > groupings really concerned with +ACI-resources+ACI?
> >
> > +AD4   - supply chains (think supply and demand)
> > +AD4   - accounting classes
> >
> > CKRM's use of the term +ACI-class+ACI drew negative comments from Paul 
> > Jackson
> > and Andrew Morton about this time last year. That led to my suggestion
> > of +ACI-Resource Groups+ACI. Unless they've changed their minds...
> >
> > +AD4 Do any of those sound remotely close?  If not, your turn :)
> >
> > I'll butt in here: task groups? task sets? confuselets? +ADs)
> 
> Generically we can use subsystem now for the individual pieces without
> confusing anyone.
> 
> I really don't much care as long as we don't start redefining
> container as something else.  I think the IBM guys took it from
> solaris originally which seems to define a zone as a set of
> isolated processes (for us all separate namespaces).  And a container
> as a set of as a zone that uses resource control.  Not exactly how
> we have been using the term but close enough not to confuse someone.
> 
> As long as we don't go calling the individual subsystems or the
> process groups they need to function a container I really don't care.
> 
> I just know that if we use container for just the subsystem level
> it makes effective communication impossible, and code reviews
> essentially impossible.  As the description says one thing the
> reviewer reads it as another and then the patch does not match
> the description.  Leading to NAKs.
> 
> Resource groups at least for subset of subsystems that aren't
> namespaces sounds reasonable.  Heck resource group, resource
> controller, resource subsystem, resource just about anything seems
> sane to me.
> 
> The important part is that we find a vocabulary without doubly
> defined words so we can communicate and a small common set we can
> agree on so people can work on and implement the individual
> resource controllers/groups, and get the individual pieces merged
> as they are reading.

from my personal PoV the following would be fine:

 spaces (for the various 'spaces')

  - similar enough to the old namespace
  - can be easily used with prefix/postfix
like in pid_space, mnt_space, uts_space etc
  - AFAIK, it is not used yet for anything else

 container (for resource accounting/limits)

  - has the 'containment' principle built in :)
  - is used in similar ways in other solutions
  - sounds similar to context (easy to associate)

note: I'm also fine with other names, as long as
we find some useable vocabulary soon, as the
different terms start confusing me on a regular
basis, and we do not go for already used names,
which would clash with Linux-VServer or OpenVZ
terminology (which would confuse the hell out of
the end-users :)

best,
Herbert

> Eric
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl

On Wed, Mar 07, 2007 at 05:35:58PM -0800, Paul Menage wrote:
> On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> Pretty much. For most of the other cases I think we are safe
>> referring to them as resource controls or resource limits. I know
>> that roughly covers what cpusets and beancounters and ckrm currently
>> do.
> 
> Plus resource monitoring (which may often be a subset of resource
> control/limits).

we (Linux-VServer) call that resource accounting, and
it is the first step to resource limits ...

> > The real trick is that I believe these groupings are designed to be
> > something you can setup on login and then not be able to switch out
> > of.
> 
> That's going to to be the case for most resource controllers - is that
> the case for namespaces? (e.g. can any task unshare say its mount
> namespace?)

ATM, yes, and there is no real harm in doing so
this would be a problem for resource containers,
unless they are strict hierarchical, i.e. only
allow to further restrict the existing resources
(which might cause some trouble if existing limits
have to be changed at some point)

best,
Herbert

> Paul
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl

On Wed, Mar 07, 2007 at 06:32:10PM -0700, Eric W. Biederman wrote:
> "Paul Menage" <[EMAIL PROTECTED]> writes:
> 
>> On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:
>>> But "namespace" has well-established historical semantics too - a way
>>> of changing the mappings of local * to global objects. This
>>> accurately describes things liek resource controllers, cpusets, resource
>>> monitoring, etc.
>>
>> Sorry, I think this statement is wrong, by the generally established
>> meaning of the term namespace in computer science.
>>
>>> Trying to extend the well-known term namespace to refer to things
>>> that are semantically equivalent namespaces is a useful approach,
>>> IMHO.
>>
>> Yes, that would be true. But the kinds of groupings that we're talking
>> about are supersets of namespaces, not semantically equivalent to
>> them. To use Eric's "shoe" analogy from earlier, it's like insisting
>> that we use the term "sneaker" to refer to all footware, including ski
>> boots and birkenstocks ...
> 
> Pretty much.  For most of the other cases I think we are safe referring
> to them as resource controls or resource limits.  

> I know that roughly covers what cpusets and beancounters and ckrm
> currently do.

let me tell you, it also covers what Linux-VServer does :)

> The real trick is that I believe these groupings are designed to
> be something you can setup on login and then not be able to switch
> out of. Which means we can't use sessions and process groups as the
> grouping entities as those have different semantics.

precisely, once you are inside a resource container, you
must not have the ability to modify its limits, and to
some degree, you should not know about the actual available
resources, but only about the artificial limits

HTC,
Herbert

> Eric
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Menage

On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:

Ok, they share this characteristic with namespaces: that they group
processes.  So, they conceptually hang off task_struct.  But we put them
on ns_proxy because we've got this vague notion that things might be
better that way.

Remember that I'm not the one pushing to move them into ns_proxy.
These patches are all Srivatsa's work. Despite that fact that they say
"Signed-off-by: Paul Menage", I'd never seen them before they were
posted to LKML, and I'm not sure that they're the right approach.
(Although some form of unification might be good).

>> about this you still insist on calling this sub-system specific stuff
>> the "container",
>>
> Uh, no. I'm trying to call a *grouping* of processes a container.
>

Ok, so is this going to supplant the namespaces too?

I don't know. It would be nice to have a single object hanging off the
task struct that contains all the various grouping pointers. Having
something that was flexible enough to handle all the required
behaviours, or else allowing completely different behaviours for
different subsets of that structure, could be the fiddly bit.

See my expanded reply to Eric' earlier post for a possible way of
unifying them, and simplifying the nsproxy and container.c code in the
process.

  - resource groups (I get a strange feeling of déjà vú there)

Resource groups isn't a terrible name for them (although I'd be
wondering whether the BeanCounters folks would object :-) ) but the
intention is that they're more generic than purely for resource
accounting. (E.g. see my other email where I suggested that things
like task->mempolicy and task->user could potentially be treated in
the same way)

Task Group is a good name, except for the fact that it's too easily
confused with process group.

And do we bother changing IPC namespaces or let that one slide?

I think that "namespace" is a fine term for the IPC id
virtualization/restriction that ipc_ns provides. (Unless I'm totally
misunderstanding the concept).

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman

Matt Helsley <[EMAIL PROTECTED]> writes:

> On Thu, 2007-03-08 at 16:32 +-1300, Sam Vilain wrote:
>
> +ADw-snip+AD4
>
> +AD4 Kirill, 06032418:36+-03:
> +AD4 +AD4 I propose to use +ACI-namespace+ACI naming.
> +AD4 +AD4 1. This is already used in fs.
> +AD4 +AD4 2. This is what IMHO suites at least OpenVZ/Eric
> +AD4 +AD4 3. it has good acronym +ACI-ns+ACI.
> +AD4 
> +AD4 Right.  So, now I'll also throw into the mix:
> +AD4 
> +AD4   - resource groups (I get a strange feeling of d+AOk-j+AOA v+APo there)
>
> +ADw-offtopic+AD4
> Re: d+AOk-j+AOA v+APo: yes+ACE
>
> It's like that Star Trek episode ... except we can't agree on the name
> of the impossible particle we will invent which solves all our problems.
> +ADw-/offtopic+AD4
>
> At the risk of prolonging the agony I hate to ask: are all of these
> groupings really concerned with +ACI-resources+ACI?
>
> +AD4   - supply chains (think supply and demand)
> +AD4   - accounting classes
>
> CKRM's use of the term +ACI-class+ACI drew negative comments from Paul Jackson
> and Andrew Morton about this time last year. That led to my suggestion
> of +ACI-Resource Groups+ACI. Unless they've changed their minds...
>
> +AD4 Do any of those sound remotely close?  If not, your turn :)
>
> I'll butt in here: task groups? task sets? confuselets? +ADs)

Generically we can use subsystem now for the individual pieces without
confusing anyone.

I really don't much care as long as we don't start redefining
container as something else.  I think the IBM guys took it from
solaris originally which seems to define a zone as a set of
isolated processes (for us all separate namespaces).  And a container
as a set of as a zone that uses resource control.  Not exactly how
we have been using the term but close enough not to confuse someone.

As long as we don't go calling the individual subsystems or the
process groups they need to function a container I really don't care.

I just know that if we use container for just the subsystem level
it makes effective communication impossible, and code reviews
essentially impossible.  As the description says one thing the
reviewer reads it as another and then the patch does not match
the description.  Leading to NAKs.

Resource groups at least for subset of subsystems that aren't
namespaces sounds reasonable.  Heck resource group, resource
controller, resource subsystem, resource just about anything seems
sane to me.

The important part is that we find a vocabulary without doubly
defined words so we can communicate and a small common set we can
agree on so people can work on and implement the individual
resource controllers/groups, and get the individual pieces merged
as they are reading.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman

Sam Vilain <[EMAIL PROTECTED]> writes:

> And do we bother changing IPC namespaces or let that one slide?

ipc namespaces works (if you worry about tiny details like we put
the resource limits for the sysv ipc objects inside the namespace).

Probably the most instructive example of this is that you can you
map a sysv ipc shared memory segment with shmat and then switch to
another sysvipc namespace you still have access by reads and writes
to that shared memory segment but you cannot manipulate it because it
doesn't have a name.

Either that or look at the output of ipcs, before and after an unshare.

SYSVIPC really doesn't have it's own (very weird) set of global names
and that is essentially all the ipc namespace deals with.

I think you have the sysvipc namespace confused with something else
though (like signal sending).

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Matt Helsley

On Thu, 2007-03-08 at 16:32 +1300, Sam Vilain wrote:

> Kirill, 06032418:36+03:
> > I propose to use "namespace" naming.
> > 1. This is already used in fs.
> > 2. This is what IMHO suites at least OpenVZ/Eric
> > 3. it has good acronym "ns".
> 
> Right.  So, now I'll also throw into the mix:
> 
>   - resource groups (I get a strange feeling of déjà vú there)

Re: déjà vú: yes!

It's like that Star Trek episode ... except we can't agree on the name
of the impossible particle we will invent which solves all our problems.

At the risk of prolonging the agony I hate to ask: are all of these
groupings really concerned with "resources"?

>   - supply chains (think supply and demand)
>   - accounting classes

CKRM's use of the term "class" drew negative comments from Paul Jackson
and Andrew Morton about this time last year. That led to my suggestion
of "Resource Groups". Unless they've changed their minds...

> Do any of those sound remotely close?  If not, your turn :)

I'll butt in here: task groups? task sets? confuselets? ;)

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain

Paul Menage wrote:
> I made sure to check [...]wikipedia.org[...] when this argument started ... 
> :-)
>   

Wikipedia?!  That's not a referen[...]

oh bugger it.  I've vented enough today and we're on the same page now I
think.

>> This is the classic terminology problem between substance and function.
>> ie, some things share characteristics but does that mean they are the
>> same thing?
>> 
>
> Aren't you arguing my side here? My point is that what I'm trying to
> add with "containers" (or whatever name we end up using) can't easily
> be subsumed into the "namespace" concept, and you're arguing that they
> should go into nsproxy because they share some characteristics.
>   

Ok, they share this characteristic with namespaces: that they group
processes.  So, they conceptually hang off task_struct.  But we put them
on ns_proxy because we've got this vague notion that things might be
better that way.

>> about this you still insist on calling this sub-system specific stuff
>> the "container",
>> 
> Uh, no. I'm trying to call a *grouping* of processes a container.
>   

Ok, so is this going to supplant the namespaces too?

>> and then go screaming that I am wrong and you are right
>> on terminology.
>> 
>
> Actually I asked if you/Eric had better suggestions.
>   

Cool, let's review them.

Me, 07921311:38+12:
> This would suggesting re-write this patchset, part 2 as a "CPUSet
> namespace", part 4 as a "CPU scheduling namespace", parts 5 and 6 as
> "Resource Limits Namespace" (drop this "BeanCounter" brand), and of
> course part 7 falls away.
Me, 07022110:58+12:
> Did you like the names I came up with in my original reply?
>  - CPUset namespace for CPU partitioning
>  - Resource namespaces:
>- cpusched namespace for CPU
>- ulimit namespace for memory
>- quota namespace for disk space
>- io namespace for disk activity
>- etc

Ok, there's nothing original or useful there; I'm obviously quite deliberately 
still punting on the issue.

Eric, 07030718:32-07:
> Pretty much.  For most of the other cases I think we are safe referring
> to them as resource controls or resource limits.I know that roughly
> covers what cpusets and beancounters and ckrm currently do.

Let's go back in time to the thread I referred to:

Me, 06032209:08+12 and nearby posts
>  - "vserver" spelt in full
>  - family
>  - container
>  - jail
>  - task_ns (sort for namespace)
> Using the term "box" and ID term "boxid":
> create_space - creates a new space and "hashes" it

Kirill, 06032418:36+03:
> I propose to use "namespace" naming.
> 1. This is already used in fs.
> 2. This is what IMHO suites at least OpenVZ/Eric
> 3. it has good acronym "ns".

Right.  So, now I'll also throw into the mix:

  - resource groups (I get a strange feeling of déjà vú there)
  - supply chains (think supply and demand)
  - accounting classes

Do any of those sound remotely close?  If not, your turn :)

And do we bother changing IPC namespaces or let that one slide?

Sam.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage


On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:


Sorry, I didn't realise I was talking with somebody qualified enough to
speak on behalf of the Generally Established Principles of Computer Science.


I made sure to check

http://en.wikipedia.org/wiki/Namespace
http://en.wikipedia.org/wiki/Namespace_%28computer_science%29

when this argument started ... :-)



This is the classic terminology problem between substance and function.
ie, some things share characteristics but does that mean they are the
same thing?


Aren't you arguing my side here? My point is that what I'm trying to
add with "containers" (or whatever name we end up using) can't easily
be subsumed into the "namespace" concept, and you're arguing that they
should go into nsproxy because they share some characteristics.



Look, I already agreed in the earlier thread that the term "namespace"
was being stretched beyond belief, yet instead of trying to be useful
about this you still insist on calling this sub-system specific stuff
the "container",


Uh, no. I'm trying to call a *grouping* of processes a container.


and then go screaming that I am wrong and you are right
on terminology.


Actually I asked if you/Eric had better suggestions.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain

Paul Menage wrote:
> Sorry, I think this statement is wrong, by the generally established
> meaning of the term namespace in computer science.
>   

Sorry, I didn't realise I was talking with somebody qualified enough to
speak on behalf of the Generally Established Principles of Computer Science.

>> Trying to extend the well-known term namespace to refer to thingsthat
>> are semantically equivalent namespaces is a useful approach, IMHO.
>>
>> 
> Yes, that would be true. But the kinds of groupings that we're talking
> about are supersets of namespaces, not semantically equivalent to
> them. To use Eric's "shoe" analogy from earlier, it's like insisting
> that we use the term "sneaker" to refer to all footware, including ski
> boots and birkenstocks ...
>   

I see it more like insisting that we use the term "clothing" to also
refer to "weapons" because for both of them you tell your body to "wear"
them in some game.

This is the classic terminology problem between substance and function. 
ie, some things share characteristics but does that mean they are the
same thing?

Look, I already agreed in the earlier thread that the term "namespace"
was being stretched beyond belief, yet instead of trying to be useful
about this you still insist on calling this sub-system specific stuff
the "container", and then go screaming that I am wrong and you are right
on terminology.

I've normally recognised[1] these three things as the primary feature
groups of vserver:

  - isolation
  - resource limiting
  - resource sharing

So I've got no problem with using "clothing" remaining for isolation and
"weapons" for resource sharing and limiting.  Or some other suitable terms.

Sam.

1. eg, http://utsl.gen.nz/talks/vserver/slide4c.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman

"Paul Menage" <[EMAIL PROTECTED]> writes:

> On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> The real trick is that I believe these groupings are designed to be something
>> you can setup on login and then not be able to switch out of.
>
> That's going to to be the case for most resource controllers - is that
> the case for namespaces? (e.g. can any task unshare say its mount
> namespace?)

With namespaces there are secondary issues with unsharing.  Weird things
like a simple unshare might allow you to replace /etc/shadow and thus
mess up a suid root application.

Once people have worked through those secondary issues unsharing of
namespaces is likely allowable (for someone without CAP_SYS_ADMIN).
Although if you pick the truly hierarchical namespaces the pid
namespace unsharing will simply give you a parent of the current
namespace.

For resource controls I expect unsharing is likely to be like the pid
namespace.  You might allow it but if you do you are forced to be a
child and possible there will be hierarchy depth restrictions.
Assuming you can implement hierarchical accounting without to much
expense.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage


On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

Pretty much.  For most of the other cases I think we are safe referring
to them as resource controls or resource limits.I know that roughly covers
what cpusets and beancounters and ckrm currently do.


Plus resource monitoring (which may often be a subset of resource
control/limits).



The real trick is that I believe these groupings are designed to be something
you can setup on login and then not be able to switch out of.


That's going to to be the case for most resource controllers - is that
the case for namespaces? (e.g. can any task unshare say its mount
namespace?)

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman

"Paul Menage" <[EMAIL PROTECTED]> writes:

> On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:
>> But "namespace" has well-established historical semantics too - a way
>> of changing the mappings of local * to global objects. This
>> accurately describes things liek resource controllers, cpusets, resource
>> monitoring, etc.
>
> Sorry, I think this statement is wrong, by the generally established
> meaning of the term namespace in computer science.
>
>>
>> Trying to extend the well-known term namespace to refer to things that
>> are semantically equivalent namespaces is a useful approach, IMHO.
>>
>
> Yes, that would be true. But the kinds of groupings that we're talking
> about are supersets of namespaces, not semantically equivalent to
> them. To use Eric's "shoe" analogy from earlier, it's like insisting
> that we use the term "sneaker" to refer to all footware, including ski
> boots and birkenstocks ...

Pretty much.  For most of the other cases I think we are safe referring
to them as resource controls or resource limits.I know that roughly covers
what cpusets and beancounters and ckrm currently do.

The real trick is that I believe these groupings are designed to be something
you can setup on login and then not be able to switch out of.  Which means
we can't use sessions and process groups as the grouping entities as those 
have different semantics.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage


On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:

But "namespace" has well-established historical semantics too - a way
of changing the mappings of local * to global objects. This
accurately describes things liek resource controllers, cpusets, resource
monitoring, etc.


Sorry, I think this statement is wrong, by the generally established
meaning of the term namespace in computer science.



Trying to extend the well-known term namespace to refer to things that
are semantically equivalent namespaces is a useful approach, IMHO.



Yes, that would be true. But the kinds of groupings that we're talking
about are supersets of namespaces, not semantically equivalent to
them. To use Eric's "shoe" analogy from earlier, it's like insisting
that we use the term "sneaker" to refer to all footware, including ski
boots and birkenstocks ...

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Srivatsa Vaddagiri

On Wed, Mar 07, 2007 at 09:29:12AM -0800, Paul Menage wrote:
> That seems bad. With the current way you're doing it, if I mount
> hierarchies A and B on /mnt/A and /mnt/B, then initially all tasks are
> in /mnt/A/tasks and /mnt/B/tasks. If I then create /mnt/A/foo and move
> a process into it, that process disappears from /mnt/B/tasks, since
> its nsproxy no longer matches the nsproxy of B's root container. Or am
> I missing something?

I realized that bug as I was doing cpuset conversion.

Basically, we can't use just tsk->nsproxy to find what tasks are in
a directory (/mnt/B for ex). Here's what I was think we should be doing instead:

struct nsproxy *ns;
void *data;

ns = dentry_of(/mnt/B/tasks)->d_parent->d_fsdata;
data = ns->ctlr_data[some subsystem id which is bound in /mnt/B hierarchy]

we now scan tasklist and find a match if:

tsk->nsproxy->ctlr_data[the above id] == data

(maybe we need to match on all data from all subsystems bound to B)

There is a similar bug in rcfs_rmdir also. We can't just use the nsproxy
pointed to by dentry to know whether the resource objects are free or
not. I am thinking (if at all resource control has to be provided on top
of nsproxy) that we should have a get_res_ns, similar to get_mnt_ns or
get_uts_ns, which will track number of nsproxies pointing to the same
resource object. If we do that, then rmdir() needs to go and check
those resource object's refcounts to see if a dir is in use or not.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-06 Thread Srivatsa Vaddagiri

On Tue, Mar 06, 2007 at 02:28:39PM +0100, Herbert Poetzl wrote:
> groups like Memory, Disk Space, Sockets might make
> sense though, although we never had a single request
> for any overlapping in the resource management (while
> we have quite a few users of overlapping Network spaces)

If we have to provide this flexibilty of different groupings for
different resources, then I dont see how we can get to the limit with
less than 3 dereferences (unless we bloat the task_struct to point to
the various limit structures directly).

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-06 Thread Herbert Poetzl

On Tue, Mar 06, 2007 at 04:09:40PM +0530, Srivatsa Vaddagiri wrote:
> On Mon, Mar 05, 2007 at 07:39:37PM +0100, Herbert Poetzl wrote:
> > > Thats why nsproxy has pointers to resource control objects, rather
> > > than embedding resource control information in nsproxy itself.
> > 
> > which makes it a (name)space, no?
> 
> I tend to agree, yes!
> 
> > > This will let different nsproxy structures share the same resource
> > > control objects (ctlr_data) and thus be governed by the same
> > > parameters.
> > 
> > as it is currently done for vfs, uts, ipc and soon
> > pid and network l2/l3, yes?
> 
> yes (by vfs do you mean mnt_ns?)

yep

> > > Where else do you think the resource control information for a
> > > container should be stored?
> > 
> > an alternative for that is to keep the resource
> > stuff as part of a 'context' structure, and keep
> > a reference from the task to that (one less
> > indirection, as we had for vfs before)
> 
> something like:
> 
>   struct resource_context {
>   int cpu_limit;
>   int rss_limit;
>   /* all other limits here */
>   }
> 
>   struct task_struct {
>   ...
>   struct resource_context *rc;
> 
>   }
> 
> ?
> 
> With this approach, it makes it hard to have task-grouping that are
> unique to each resource. 

that is correct ...

> For ex: lets say that CPU and Memory needs to be divided as follows:
> 
>   CPU : C1 (70%), C2 (30%)
>   Mem : M1 (60%), M2 (40%)
> 
> Tasks T1, T2, T3, T4 are assigned to these resource classes as follows:
> 
>   C1 : T1, T3
>   C2 : T2, T4
>   M1 : T1, T4
>   M2 : T2, T3
> 
> We had a lengthy discussion on this requirement here:
> 
>   http://lkml.org/lkml/2006/11/6/95
>   http://lkml.org/lkml/2006/11/1/239
> 
> Linus also has expressed a similar view here:
> 
>   http://lwn.net/Articles/94573/

you probably could get that flexibility by grouping
certain limits into a separate struct, but IMHO the
real world use of this is limited, because the resource
limitations usually only fulfill one purpose, being
protection from malicious users and DoS prevention

groups like Memory, Disk Space, Sockets might make
sense though, although we never had a single request
for any overlapping in the resource management (while
we have quite a few users of overlapping Network spaces)

> Paul Menage's (and its clone rcfs) patches allows this flexibility by
> simply mounting different hierarchies:
> 
>   mount -t container -o cpu none /dev/cpu
>   mount -t container -o mem none /dev/mem
> 
> The task-groups created under /dev/cpu can be completely independent of
> task-groups created under /dev/mem.
> 
> Lumping together all resource parameters in one struct (like
> resource_context above) makes it difficult to provide this feature.   
> 
> Now can we live w/o this flexibility? Maybe, I don't know for sure.
> Since (stability of) user-interface is in question, we need to take a
> carefull decision here.

I don't like the dev/filesystem interface at all
but I can probably live with it :)

> > > then other derefences (->ctlr_data[] and ->limit) should be fast,
> > > theas y should be in the cache?
> > 
> > please provide real world numbers from testing ...
> 
> What kind of testing did you have in mind?

for example, implement RSS/VM limits and run memory
intensive tests like kernel building or so, see that
the accounting and limit checks do not add measureable
overhead ...

similar could be done for socket/ipc accounting and
multithreaded network tests (apache comes to my mind)

HTH,
Herbert

> -- 
> Regards,
> vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-06 Thread Srivatsa Vaddagiri

On Mon, Mar 05, 2007 at 07:39:37PM +0100, Herbert Poetzl wrote:
> > Thats why nsproxy has pointers to resource control objects, rather
> > than embedding resource control information in nsproxy itself.
> 
> which makes it a (name)space, no?

I tend to agree, yes!

> > This will let different nsproxy structures share the same resource
> > control objects (ctlr_data) and thus be governed by the same
> > parameters.
> 
> as it is currently done for vfs, uts, ipc and soon
> pid and network l2/l3, yes?

yes (by vfs do you mean mnt_ns?)

> > Where else do you think the resource control information for a
> > container should be stored?
> 
> an alternative for that is to keep the resource
> stuff as part of a 'context' structure, and keep
> a reference from the task to that (one less
> indirection, as we had for vfs before)

something like:

struct resource_context {
int cpu_limit;
int rss_limit;
/* all other limits here */
}

struct task_struct {
...
struct resource_context *rc;

}

?

With this approach, it makes it hard to have task-grouping that are
unique to each resource. 

For ex: lets say that CPU and Memory needs to be divided as follows:

CPU : C1 (70%), C2 (30%)
Mem : M1 (60%), M2 (40%)

Tasks T1, T2, T3, T4 are assigned to these resource classes as follows:

C1 : T1, T3
C2 : T2, T4
M1 : T1, T4
M2 : T2, T3

We had a lengthy discussion on this requirement here:

http://lkml.org/lkml/2006/11/6/95
http://lkml.org/lkml/2006/11/1/239

Linus also has expressed a similar view here:

http://lwn.net/Articles/94573/

Paul Menage's (and its clone rcfs) patches allows this flexibility by simply 
mounting different hierarchies:

mount -t container -o cpu none /dev/cpu
mount -t container -o mem none /dev/mem

The task-groups created under /dev/cpu can be completely independent of
task-groups created under /dev/mem.

Lumping together all resource parameters in one struct (like
resource_context above) makes it difficult to provide this feature. 

Now can we live w/o this flexibility? Maybe, I don't know for sure.
Since (stability of) user-interface is in question, we need to take a
carefull decision here.

> > then other derefences (->ctlr_data[] and ->limit) should be fast, as
> > they should be in the cache?
> 
> please provide real world numbers from testing ...

What kind of testing did you have in mind?

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-05 Thread Herbert Poetzl

On Mon, Mar 05, 2007 at 11:04:01PM +0530, Srivatsa Vaddagiri wrote:
> On Sat, Mar 03, 2007 at 06:32:44PM +0100, Herbert Poetzl wrote:
> > > Yes, perhaps this overloads nsproxy more than what it was intended for.
> > > But, then if we have to to support resource management of each
> > > container/vserver (or whatever group is represented by nsproxy),
> > > then nsproxy seems the best place to store this resource control 
> > > information for a container.
> > 
> > well, the thing is, as nsproxy is working now, you
> > will get a new one (with a changed subset of entries)
> > every time a task does a clone() with one of the 
> > space flags set, which means, that you will end up
> > with quite a lot of them, but resource limits have
> > to address a group of them, not a single nsproxy
> > (or act in a deeply hierarchical way which is not
> > there atm, and probably will never be, as it simply
> > adds too much overhead)
> 
> Thats why nsproxy has pointers to resource control objects, rather
> than embedding resource control information in nsproxy itself.

which makes it a (name)space, no?

> >From the patches:
> 
> struct nsproxy {
> 
> +#ifdef CONFIG_RCFS
> +   struct list_head list;
> +   void *ctlr_data[CONFIG_MAX_RC_SUBSYS];
> +#endif
> 
> }
> 
> This will let different nsproxy structures share the same resource
> control objects (ctlr_data) and thus be governed by the same
> parameters.

as it is currently done for vfs, uts, ipc and soon
pid and network l2/l3, yes?

> Where else do you think the resource control information for a
> container should be stored?

an alternative for that is to keep the resource
stuff as part of a 'context' structure, and keep
a reference from the task to that (one less
indirection, as we had for vfs before)

> > > It should have the same perf overhead as the original
> > > container patches (basically a double dereference -
> > > task->containers/nsproxy->cpuset - required to get to the 
> > > cpuset from a task).
> > 
> > on every limit accounting or check? I think that
> > is quite a lot of overhead ...
> 
> tsk->nsproxy->ctlr_data[cpu_ctlr->id]->limit (4 dereferences)
> is what we need to get to the cpu b/w limit for a task.

sounds very 'cache intensive' to me ...
(especially compared to the one indirection be use atm)

> If cpu_ctlr->id is compile time decided, then that would reduce it to 3.
> 
> But I think if CPU scheduler schedules tasks from same
> container one after another (to the extent possible that is),

which is very probably not what you want, as it
 
 - will definitely hurt interactivity
 - give strange 'jerky' behaviour
 - ignore established priorities

> then other derefences (->ctlr_data[] and ->limit) should be fast, as
> they should be in the cache?

please provide real world numbers from testing ...

at least for me, that is not really obvious in
four way indirection  :)

TIA,
Herbert

> -- 
> Regards,
> vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-05 Thread Srivatsa Vaddagiri

On Sat, Mar 03, 2007 at 06:32:44PM +0100, Herbert Poetzl wrote:
> > Yes, perhaps this overloads nsproxy more than what it was intended for.
> > But, then if we have to to support resource management of each
> > container/vserver (or whatever group is represented by nsproxy),
> > then nsproxy seems the best place to store this resource control 
> > information for a container.
> 
> well, the thing is, as nsproxy is working now, you
> will get a new one (with a changed subset of entries)
> every time a task does a clone() with one of the 
> space flags set, which means, that you will end up
> with quite a lot of them, but resource limits have
> to address a group of them, not a single nsproxy
> (or act in a deeply hierarchical way which is not
> there atm, and probably will never be, as it simply
> adds too much overhead)

Thats why nsproxy has pointers to resource control objects, rather than
embedding resource control information in nsproxy itself.

>From the patches:

struct nsproxy {

+#ifdef CONFIG_RCFS
+   struct list_head list;
+   void *ctlr_data[CONFIG_MAX_RC_SUBSYS];
+#endif

}

This will let different nsproxy structures share the same resource
control objects (ctlr_data) and thus be governed by the same parameters.

Where else do you think the resource control information for a container
should be stored?

> > It should have the same perf overhead as the original
> > container patches (basically a double dereference -
> > task->containers/nsproxy->cpuset - required to get to the 
> > cpuset from a task).
> 
> on every limit accounting or check? I think that
> is quite a lot of overhead ...

tsk->nsproxy->ctlr_data[cpu_ctlr->id]->limit (4 dereferences) is what we 
need to get to the cpu b/w limit for a task.

If cpu_ctlr->id is compile time decided, then that would reduce it to 3.

But I think if CPU scheduler schedules tasks from same container one
after another (to the extent possible that is), then other derefences
(->ctlr_data[] and ->limit) should be fast, as they should be in the cache?

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

50 matches

Mail list logo