Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Srivatsa Vaddagiri
On Thu, Mar 22, 2007 at 09:39:09AM -0500, Serge E. Hallyn wrote:
> > What troubles will mounting both cpuset and ns in the same hierarchy
> > cause?
> 
> Wow, don't recall the full context here.

Sorry to have come back so late on this :)

> But at least with Paul's container patchset, a subsystem can only be mounted 
> once.  So if the nsproxy container subsystem is always mounted by itself, 
> then you cannot remount it to bind it with cpusets.

Sure ..I was thinking of mounting both ns and cpuset in same hierarchy
for first time itself.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Serge E. Hallyn
Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
> On Fri, Mar 09, 2007 at 10:50:17AM -0600, Serge E. Hallyn wrote:
> > The nsproxy container subsystem could be said to be that unification.
> > If we really wanted to I suppose we could now always mount the nsproxy
> > subsystem, get rid of tsk->nsproxy, and always get thta through it's
> > nsproxy subsystem container.  But then that causes trouble with being
> > able to mount a hierarachy like
> > 
> > mount -t container -o ns,cpuset
> 
> What troubles will mounting both cpuset and ns in the same hierarchy
> cause?

Wow, don't recall the full context here.  But at least with Paul's
container patchset, a subsystem can only be mounted once.  So if the
nsproxy container subsystem is always mounted by itself, then you cannot
remount it to bind it with cpusets.

> IMO that may be a good feature by itself, which makes it convenient to 
> bind different containers to different cpusets.

Absolutely.

-serge

> In this case, we want 'ns' subsystem to override all decisions wrt
> mkdir of directories and also movement of tasks b/n different
> groups. This is automatically accomplished in the patches, by having ns
> subsystem veto mkdir/can_attach request which aren't allowed as per
> namespace semantics (but which may be allowed as per cpuset semantics).
> 
> > so we'd have to fix something.  It also slows things down...
> 
> -- 
> Regards,
> vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 10:50:17AM -0600, Serge E. Hallyn wrote:
> The nsproxy container subsystem could be said to be that unification.
> If we really wanted to I suppose we could now always mount the nsproxy
> subsystem, get rid of tsk->nsproxy, and always get thta through it's
> nsproxy subsystem container.  But then that causes trouble with being
> able to mount a hierarachy like
> 
>   mount -t container -o ns,cpuset

What troubles will mounting both cpuset and ns in the same hierarchy
cause? IMO that may be a good feature by itself, which makes it convenient to 
bind different containers to different cpusets.

In this case, we want 'ns' subsystem to override all decisions wrt
mkdir of directories and also movement of tasks b/n different
groups. This is automatically accomplished in the patches, by having ns
subsystem veto mkdir/can_attach request which aren't allowed as per
namespace semantics (but which may be allowed as per cpuset semantics).

> so we'd have to fix something.  It also slows things down...

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 10:50:17AM -0600, Serge E. Hallyn wrote:
 The nsproxy container subsystem could be said to be that unification.
 If we really wanted to I suppose we could now always mount the nsproxy
 subsystem, get rid of tsk-nsproxy, and always get thta through it's
 nsproxy subsystem container.  But then that causes trouble with being
 able to mount a hierarachy like
 
   mount -t container -o ns,cpuset

What troubles will mounting both cpuset and ns in the same hierarchy
cause? IMO that may be a good feature by itself, which makes it convenient to 
bind different containers to different cpusets.

In this case, we want 'ns' subsystem to override all decisions wrt
mkdir of directories and also movement of tasks b/n different
groups. This is automatically accomplished in the patches, by having ns
subsystem veto mkdir/can_attach request which aren't allowed as per
namespace semantics (but which may be allowed as per cpuset semantics).

 so we'd have to fix something.  It also slows things down...

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Serge E. Hallyn
Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
 On Fri, Mar 09, 2007 at 10:50:17AM -0600, Serge E. Hallyn wrote:
  The nsproxy container subsystem could be said to be that unification.
  If we really wanted to I suppose we could now always mount the nsproxy
  subsystem, get rid of tsk-nsproxy, and always get thta through it's
  nsproxy subsystem container.  But then that causes trouble with being
  able to mount a hierarachy like
  
  mount -t container -o ns,cpuset
 
 What troubles will mounting both cpuset and ns in the same hierarchy
 cause?

Wow, don't recall the full context here.  But at least with Paul's
container patchset, a subsystem can only be mounted once.  So if the
nsproxy container subsystem is always mounted by itself, then you cannot
remount it to bind it with cpusets.

 IMO that may be a good feature by itself, which makes it convenient to 
 bind different containers to different cpusets.

Absolutely.

-serge

 In this case, we want 'ns' subsystem to override all decisions wrt
 mkdir of directories and also movement of tasks b/n different
 groups. This is automatically accomplished in the patches, by having ns
 subsystem veto mkdir/can_attach request which aren't allowed as per
 namespace semantics (but which may be allowed as per cpuset semantics).
 
  so we'd have to fix something.  It also slows things down...
 
 -- 
 Regards,
 vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-22 Thread Srivatsa Vaddagiri
On Thu, Mar 22, 2007 at 09:39:09AM -0500, Serge E. Hallyn wrote:
  What troubles will mounting both cpuset and ns in the same hierarchy
  cause?
 
 Wow, don't recall the full context here.

Sorry to have come back so late on this :)

 But at least with Paul's container patchset, a subsystem can only be mounted 
 once.  So if the nsproxy container subsystem is always mounted by itself, 
 then you cannot remount it to bind it with cpusets.

Sure ..I was thinking of mounting both ns and cpuset in same hierarchy
for first time itself.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-13 Thread Herbert Poetzl
On Mon, Mar 12, 2007 at 07:25:48PM -0700, Paul Menage wrote:
> On 3/12/07, Herbert Poetzl <[EMAIL PROTECTED]> wrote:
> >
> > why? you simply enter that specific space and
> > use the existing mechanisms (netlink, proc, whatever)
> > to retrieve the information with _existing_ tools,
> 
> That's assuming that you're using network namespace virtualization,

or isolation :)

> with each group of tasks in a separate namespace. 

correct ...

> What if you don't want the virtualization overhead, just the
> accounting?

there should be no 'virtualization' overhead, and what
do you want to account for, if not by a group of tasks?

maybe I'm missing the grouping condition here, but I
assume you assign tasks to the accounting containers

note: network isolation is not supposed to add overhead
compared to the host system (at least not measureable
overhead)

best,
Herbert

> Paul
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-13 Thread Herbert Poetzl
On Mon, Mar 12, 2007 at 07:25:48PM -0700, Paul Menage wrote:
 On 3/12/07, Herbert Poetzl [EMAIL PROTECTED] wrote:
 
  why? you simply enter that specific space and
  use the existing mechanisms (netlink, proc, whatever)
  to retrieve the information with _existing_ tools,
 
 That's assuming that you're using network namespace virtualization,

or isolation :)

 with each group of tasks in a separate namespace. 

correct ...

 What if you don't want the virtualization overhead, just the
 accounting?

there should be no 'virtualization' overhead, and what
do you want to account for, if not by a group of tasks?

maybe I'm missing the grouping condition here, but I
assume you assign tasks to the accounting containers

note: network isolation is not supposed to add overhead
compared to the host system (at least not measureable
overhead)

best,
Herbert

 Paul
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Menage

On 3/12/07, Herbert Poetzl <[EMAIL PROTECTED]> wrote:


why? you simply enter that specific space and
use the existing mechanisms (netlink, proc, whatever)
to retrieve the information with _existing_ tools,


That's assuming that you're using network namespace virtualization,
with each group of tasks in a separate namespace. What if you don't
want the virtualization overhead, just the accounting?

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Tue, Mar 13, 2007 at 12:31:13AM +0100, Herbert Poetzl wrote:
> just means that the current Linux-VServer behaviour
> is a subset of that, no problem there as long as
> it really _is_ a subset :) we always like to provide
> more features in the future, no problem with that :)

Considering the example Sam quoted, doesn't it make sense to split
resource classes (some of them atleast) independent of each other?
That would also argue for providing multiple hierarchy feature in Paul's
patches.

Given that and the mail Serge sent on why nsproxy optimization is
usefull given numbers, can you reconsider your earlier proposals as
below:

- pid_ns and resource parameters should be in a single struct
  (point 1c, 7b in [1])

- pointers to resource controlling objects should be inserted
  in task_struct directly (instead of nsproxy indirection)
  (points 2c in [1])

[1] http://lkml.org/lkml/2007/3/12/138

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Herbert Poetzl
On Mon, Mar 12, 2007 at 09:50:45PM +0530, Srivatsa Vaddagiri wrote:
> On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
> > What's wrong with that?
> 
> I had been asking around on "what is the fundamental unit of res mgmt
> for vservers" and the answer I got (from Herbert) was "all tasks that are
> in the same pid namespace". From what you are saying above, it seems to
> be that there is no such "fundamental" unit. It can be a random mixture
> of tasks (taken across vservers) whose resource consumption needs to be
> controlled. Is that correct?

just means that the current Linux-VServer behaviour
is a subset of that, no problem there as long as
it really _is_ a subset :) we always like to provide
more features in the future, no problem with that :)

best,
Herbert

> > >   echo "cid 2" > /dev/cpu/prof/tasks 
> > 
> > Adding that feature sounds fine, 
> 
> Ok yes ..that can be a optional feature.
> 
> -- 
> Regards,
> vatsa
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Herbert Poetzl
On Mon, Mar 12, 2007 at 03:00:25AM -0700, Paul Menage wrote:
> On 3/11/07, Paul Jackson <[EMAIL PROTECTED]> wrote:
> >
> > My current understanding of Paul Menage's container patch is that it is
> > a useful improvement for some of the metered classes - those that could
> > make good use of a file system like hierarchy for their interface.
> > It probably doesn't benefit all metered classes, as they won't all
> > benefit from a file system like hierarchy, or even have a formal name
> > space, and it doesn't seem to benefit the name space implementation,
> > which happily remains flat.
> 
> Well, what I was aiming at was a generic mechanism that can handle
> "namespaces", "metered classes" and other ways of providing
> per-task-group behaviour. So a system-call API doesn't necessarily
> have the right flexibility to implement the possible different kinds
> of subsystems I envisage.
> 
> For example, one way to easily tie groups of processes to different
> network queues is to have a tag associated with a container, allow
> that to propagate to the socket/skbuf priority field, and then use
> standard Linux traffic control to pick the appropriate outgoing queue
> based on the skbuf's tag.
> 
> This isn't really a namespace, and it isn't really a "metered class".
> It's just a way of associating a piece of data (the network tag) with
> a group of processes.
> 
> With a filesystem-based interface, it's easy to have a file as the
> method of reading/writing the tag; with a system call interface, then
> either the interface is sufficiently generic to allow this kind of
> data association (in which case you're sort of implementing a
> filesystem in the system call) or else you have to shoehorn into an
> unrelated API (e.g. if your system call talks about "resource limits"
> you might end up having to specify the network tag as a "maximum
> limit" since there's no other useful configuration data available).
> 
> As another example, I'd like to have a subsystem that shows me all the
> sockets that processes in the container have opened; again, easy to do
> in a filesystem interface, but hard to fit into a
> resource-meteting-centric or namespace-centric system call API.

why? you simply enter that specific space and
use the existing mechanisms (netlink, proc, whatever)
to retrieve the information with _existing_ tools,
no need to do anything unusual via the syscall API

and if you support a 'spectator' context or capability
(which allows to see the _whole_ truth) you can also
get this information for _all_ sockets with existing
tools like netstat or lsof ...

best,
Herbert

> Paul
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Sam Vilain
Srivatsa Vaddagiri wrote:
> On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
>   
>> What's wrong with that?
>> 
>
> I had been asking around on "what is the fundamental unit of res mgmt
> for vservers" and the answer I got (from Herbert) was "all tasks that are
> in the same pid namespace". From what you are saying above, it seems to
> be that there is no such "fundamental" unit. It can be a random mixture
> of tasks (taken across vservers) whose resource consumption needs to be
> controlled. Is that correct?
>   

Sure, for instance, all postgres processes across all servers might be
put in a different IO and buffercache use container by the system
administrator.

Sam.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Jackson
vatsa wrote:
> This assumes that you can see the global vfs namespace right?
> 
> What if you are inside a container/vserver which restricts your vfs
> namespace? i.e /dev/cpusets seen from one container is not same as what
> is seen from another container .

Well, yes.  But that restriction on the namespace is no doing of
cpusets.

It's some vfs namespace restriction, which should be an orthogonal
mechanism.

Well, it's probably not orthogonal at present.  Cpusets might not yet
handle a restricted vfs name space very well.

For example the /proc//cpuset path, giving path below /dev/cpuset
of task pid's cpuset, might not be restricted.  And the set of all CPUs
and Memory Nodes that are online, which is visible in various /proc
files, and also visible in ones top cpuset, might be inconsistent if
restricted vfs namespace mapped you to a different top cpuset.

There are probably other loose ends as well.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Serge E. Hallyn
Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
> On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
> > What's wrong with that?
> 
> I had been asking around on "what is the fundamental unit of res mgmt
> for vservers" and the answer I got (from Herbert) was "all tasks that are
> in the same pid namespace". From what you are saying above, it seems to
> be that there is no such "fundamental" unit. It can be a random mixture
> of tasks (taken across vservers) whose resource consumption needs to be
> controlled. Is that correct?

If I'm reading it right, yes.

If for vservers the fundamental unit of res mgmt is a vserver, that can
surely be done at a higher level than in the kernel.

Actually, these could be tied just by doing

mount -t container -o ns,cpuset /containers

So now any task in /containers/vserver1 or any subdirectory thereof
would have the same cpuset constraints as /containers.  OTOH, you could
mount them separately

mount -t container -o ns /nsproxy
mount -t container -o cpuset /cpuset

and now you have the freedom to split tasks in the same vserver
(under /nsproxy/vserver1) into different cpusets.

-serge

> > >   echo "cid 2" > /dev/cpu/prof/tasks 
> > 
> > Adding that feature sounds fine, 
> 
> Ok yes ..that can be a optional feature.
> 
> -- 
> Regards,
> vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
> What's wrong with that?

I had been asking around on "what is the fundamental unit of res mgmt
for vservers" and the answer I got (from Herbert) was "all tasks that are
in the same pid namespace". From what you are saying above, it seems to
be that there is no such "fundamental" unit. It can be a random mixture
of tasks (taken across vservers) whose resource consumption needs to be
controlled. Is that correct?

> > echo "cid 2" > /dev/cpu/prof/tasks 
> 
> Adding that feature sounds fine, 

Ok yes ..that can be a optional feature.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Serge E. Hallyn
Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
> On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
> > > 3. This next leads me to think that 'tasks' file in each directory doesnt 
> > > make
> > >sense for containers. In fact it can lend itself to error situations 
> > > (by
> > >administrator/script mistake) when some tasks of a container are in one
> > >resource class while others are in a different class.
> > >
> > > Instead, from a containers pov, it may be usefull to write
> > > a 'container id' (if such a thing exists) into the tasks file
> > > which will move all the tasks of the container into
> > > the new resource class. This is the same requirement we
> > > discussed long back of moving all threads of a process into new
> > > resource class.
> > 
> > I think you need to give a more concrete example and use case of what
> > you're trying to propose here. I don't really see what advantage
> > you're getting.
> 
> Ok, this is what I had in mind:
> 
> 
>   mount -t container -o ns /dev/namespace
>   mount -t container -o cpu /dev/cpu
> 
> Lets we have the namespaces/resource-groups created as under:
> 
>   /dev/namespace
>   |-- prof
>   ||- tasks <- (T1, T2)
>   ||- container_id <- 1 (doesnt exist today perhaps)
>   |
>   |-- student
>   ||- tasks <- (T3, T4)
>   ||- container_id <- 2 (doesnt exist today perhaps)
> 
>   /dev/cpu
>  |-- prof
>  ||-- tasks
>  ||-- cpu_limit (40%)
>  |
>  |-- student
>  ||-- tasks
>  ||-- cpu_limit (20%)
>  |
>  |
> 
> 
> Is it possible to create the above structure in container patches? 
> /me thinks so.
> 
> If so, then accidentally someone can do this:
> 
>   echo T1 > /dev/cpu/prof/tasks
>   echo T2 > /dev/cpu/student/tasks
> 
> with the result that tasks of the same container are now in different
> resource classes.

What's wrong with that?

> Thats why in case of containers I felt we shldnt allow individual tasks
> to be cat'ed to tasks file. 
> 
> Or rather, it may be nice to say :
> 
>   echo "cid 2" > /dev/cpu/prof/tasks 
> 
> and have all tasks belonging to container id 2 move to the new resource
> group.

Adding that feature sounds fine, but don't go stopping me from putting
T1 into /dev/cpu/prof/tasks and T2 into /dev/cpu/student/tasks just
because you have your own notion of what each task is supposed to be.
Just because they're in the same namespaces doesn't mean they should get
the same resource allocations.  If you want to add that kind of policy,
well, it should be policy - user-definable.

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Mon, Mar 12, 2007 at 07:31:48PM +0530, Srivatsa Vaddagiri wrote:
> not so. This in-fact lets vservers and containers to work with each
> other. So:

s/containers/cpusets

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
> > 3. This next leads me to think that 'tasks' file in each directory doesnt 
> > make
> >sense for containers. In fact it can lend itself to error situations (by
> >administrator/script mistake) when some tasks of a container are in one
> >resource class while others are in a different class.
> >
> > Instead, from a containers pov, it may be usefull to write
> > a 'container id' (if such a thing exists) into the tasks file
> > which will move all the tasks of the container into
> > the new resource class. This is the same requirement we
> > discussed long back of moving all threads of a process into new
> > resource class.
> 
> I think you need to give a more concrete example and use case of what
> you're trying to propose here. I don't really see what advantage
> you're getting.

Ok, this is what I had in mind:


mount -t container -o ns /dev/namespace
mount -t container -o cpu /dev/cpu

Lets we have the namespaces/resource-groups created as under:

/dev/namespace
|-- prof
||- tasks <- (T1, T2)
||- container_id <- 1 (doesnt exist today perhaps)
|
|-- student
||- tasks <- (T3, T4)
||- container_id <- 2 (doesnt exist today perhaps)

/dev/cpu
   |-- prof
   ||-- tasks
   ||-- cpu_limit (40%)
   |
   |-- student
   ||-- tasks
   ||-- cpu_limit (20%)
   |
   |


Is it possible to create the above structure in container patches? 
/me thinks so.

If so, then accidentally someone can do this:

echo T1 > /dev/cpu/prof/tasks
echo T2 > /dev/cpu/student/tasks

with the result that tasks of the same container are now in different
resource classes.

Thats why in case of containers I felt we shldnt allow individual tasks
to be cat'ed to tasks file. 

Or rather, it may be nice to say :

echo "cid 2" > /dev/cpu/prof/tasks 

and have all tasks belonging to container id 2 move to the new resource
group.



-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Wed, Mar 07, 2007 at 03:59:19PM -0600, Serge E. Hallyn wrote:
> > containers patches uses just a single pointer in the task_struct, and
> > all tasks in the same set of containers (across all hierarchies) will
> > share a single container_group object, which holds the actual pointers
> > to container state.
> 
> Yes, that's why this consolidation doesn't make sense to me.
> 
> Especially considering again that we will now have nsproxies pointing to
> containers pointing to... nsproxies.

nsproxies needn't point to containers. It (or as Herbert pointed -
nsproxy->pid_ns) can have direct pointers to resource objects (whatever
struct container->subsys[] points to).


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 02:06:03PM -0800, Paul Jackson wrote:
> >  if you create a 'resource container' to limit the
> >  usage of a set of resources for the processes
> >  belonging to this container, it would be kind of
> >  defeating the purpose, if you'd allow the processes
> >  to manipulate their limits, no?
> 
> Wrong - this is not the only way.
> 
> For instance in cpusets, -any- task in the system, regardless of what
> cpuset it is currently assigned to, might be able to manipulate -any-
> cpuset in the system.
> 
> Yes -- some sufficient mechanism is required to keep tasks from
> escalating their resources or capabilities beyond an allowed point.
> 
> But that mechanism might not be strictly based on position in some
> hierarchy.
> 
> In the case of cpusets, it is based on the permissions on files in
> the cpuset file system (normally mounted at /dev/cpuset), versus
> the current priviledges and capabilities of the task.
> 
> A root priviledged task in the smallest leaf node cpuset can manipulate
> every cpuset in the system.  This is an ordinary and common occurrence.

This assumes that you can see the global vfs namespace right?

What if you are inside a container/vserver which restricts your vfs
namespace? i.e /dev/cpusets seen from one container is not same as what
is seen from another container ..Is that a unrealistic scenario? IMHO
not so. This in-fact lets vservers and containers to work with each
other. So:

/dev/cpuset
|- C1   <- Container A bound to this
|  |- C11
|  |- C12
|
|- C2   <- Container B bound to this
   |- C21
   |- C22


C1 and C2 are two exclusive cpusets and containers/vservers A and B are bound 
to C1/C2 respectively.

>From inside container/vserver A, if you were to look at /dev/cpuset, it will
-appear- as if you are in the top cpuset (with just C11 and C12 child
cpusets). It cannot modify C2 at all (since it has no visibility).

Similarly if you were to look at /dev/cpuset from inside B, it will list only 
C21/C22 with tasks in container B not being able to see C1 at all.

:)


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Menage

On 3/11/07, Paul Jackson <[EMAIL PROTECTED]> wrote:


My current understanding of Paul Menage's container patch is that it is
a useful improvement for some of the metered classes - those that could
make good use of a file system like hierarchy for their interface.
It probably doesn't benefit all metered classes, as they won't all
benefit from a file system like hierarchy, or even have a formal name
space, and it doesn't seem to benefit the name space implementation,
which happily remains flat.


Well, what I was aiming at was a generic mechanism that can handle
"namespaces", "metered classes" and other ways of providing
per-task-group behaviour. So a system-call API doesn't necessarily
have the right flexibility to implement the possible different kinds
of subsystems I envisage.

For example, one way to easily tie groups of processes to different
network queues is to have a tag associated with a container, allow
that to propagate to the socket/skbuf priority field, and then use
standard Linux traffic control to pick the appropriate outgoing queue
based on the skbuf's tag.

This isn't really a namespace, and it isn't really a "metered class".
It's just a way of associating a piece of data (the network tag) with
a group of processes.

With a filesystem-based interface, it's easy to have a file as the
method of reading/writing the tag; with a system call interface, then
either the interface is sufficiently generic to allow this kind of
data association (in which case you're sort of implementing a
filesystem in the system call) or else you have to shoehorn into an
unrelated API (e.g. if your system call talks about "resource limits"
you might end up having to specify the network tag as a "maximum
limit" since there's no other useful configuration data available).

As another example, I'd like to have a subsystem that shows me all the
sockets that processes in the container have opened; again, easy to do
in a filesystem interface, but hard to fit into a
resource-meteting-centric or namespace-centric system call API.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Sam Vilain
Allow me to annotate your nice summary. A lot of this is elaborating on
what you are saying; and I think where we disagree, the differences are
not important.

Paul Jackson wrote:
> We have actors, known as threads, tasks or processes, which use things,
> which are instances of such classes of things as disk partitions,
> file systems, memory, cpus, and semaphores.
>
> We assign names to these things, such as SysV id's to the semaphores,
> mount points to the file systems, pathnames to files and file
> descriptors to open files.  These names provide handles that
> are typically more convenient and efficient to use, but alas less
> persistent, less ubiquitous, and needing of some dereferencing when
> used, to identify the underlying thing.
>
> Any particular assignment of names to some of the things in particular
> class forms one namespace (aka 'space', above).  For each class of
> things, a given task is assigned one such namespace.  Typically many
> related tasks (such as all those of a login session or a job) will be
> assigned the same set of namespaces, leading to various opportunities
> for optimizing the management of namespaces in the kernel.
>
> This assignment of names to things is neither injective nor surjective
> nor even a complete map.
>
> For example, not all file systems are mounted, certainly not all
> possible mount points (all directories) serve as mount points,
> sometimes the same file system is mounted in multiple places, and
> sometimes more than one file system is mounted on the same mount point,
> one hiding the other.
>   

Right, which is why I preferred the term "mount space" or "mount
namespace". The keys in the map are not as important as the presence of
the independent map itself.

Unadorned "namespaces" is currently how they are known, and short of
becoming the Hurd I don't think this term is appropriate for Linux.

> In so far as the code managing this naming is concerned, the names are
> usually fairly arbitrary, except that there seems to be a tendency
> toward properly virtualizing these namespaces, presenting to a task
> the namespaces assigned it as if that was all there was, hiding the
> presence of alternative namespaces, and intentionally not providing a
> 'global view' that encompasses all namespaces of a given class.
>
> This tendency culminates in the full blown virtual machines, such as
> Xen and KVM, which virtualize more or less all namespaces.
>   

Yes, these systems, somewhat akin to microkernels, virtualize all
namespaces as a byproduct of their nature.

> Because the essential semantics relating one namespace to another are
> rather weak (the namespaces for any given class of things are or can
> be pretty much independent of each other), there is a preference and
> a tradition to keep such sets of namespaces a simple flat space.
>   

This has been the practice to date with most worked implementations,
with the proviso that as the feature becomes standard people may start
expecting spaces within spaces to work indistinguishably from top level
spaces; in fact, perhaps there should be no such distinction between a
"top" space and a subservient space, other than the higher level space
is aware of the subservient space.

Consider, for instance, that BIND already uses Linux kernel features
which are normally only attributed to the top space, such as adjusting
ulimits and unsetting capability bits. This kind of application
self-containment may become more commonplace.

And this perhaps begs the question: is it worth the performance penalty,
or must there be one?

> Conclusions regarding namespaces, aka spaces:
>
> A namespace provide a set of convenient handles for things of a
> particular class.
>
> For each class of things, every task gets one namespace (perhaps
> a Null or Default one.)
>   

Conceptually, every task exists in exactly one space of each type,
though that space may see and/or administer other spaces.

> Namespaces are partial virtualizations, the 'space of namespaces'
> is pretty flat, and the assignment of names in one namespace is
> pretty independent of the next.
>
> ===
>
> That much covers what I understand (perhaps in error) of namespaces.
>
> So what's this resource accounting/limits stuff?
>
> I think this depends on adding one more category to our universe.
>
> For the purposes of introducing yet more terms, I will call this
> new category a "metered class."
>   

The term "metered" implies "a resource which renews over time". How does
this apply to a fixed limit? A limit's nominal unit may not be delimited
in terms of time, but it must be continually maintained, so it can be
"metered" in terms of use of that limit over time.

For instance, a single system in scheduling terms is limited to the use
of the number of CPUs present in the system. So, while it has a "limit"
of 2 CPUs, in terms of a metered resource, it has a maximum rate of 2
CPU seconds per second.

> Each time we set about to manage some 

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Sam Vilain
Allow me to annotate your nice summary. A lot of this is elaborating on
what you are saying; and I think where we disagree, the differences are
not important.

Paul Jackson wrote:
 We have actors, known as threads, tasks or processes, which use things,
 which are instances of such classes of things as disk partitions,
 file systems, memory, cpus, and semaphores.

 We assign names to these things, such as SysV id's to the semaphores,
 mount points to the file systems, pathnames to files and file
 descriptors to open files.  These names provide handles that
 are typically more convenient and efficient to use, but alas less
 persistent, less ubiquitous, and needing of some dereferencing when
 used, to identify the underlying thing.

 Any particular assignment of names to some of the things in particular
 class forms one namespace (aka 'space', above).  For each class of
 things, a given task is assigned one such namespace.  Typically many
 related tasks (such as all those of a login session or a job) will be
 assigned the same set of namespaces, leading to various opportunities
 for optimizing the management of namespaces in the kernel.

 This assignment of names to things is neither injective nor surjective
 nor even a complete map.

 For example, not all file systems are mounted, certainly not all
 possible mount points (all directories) serve as mount points,
 sometimes the same file system is mounted in multiple places, and
 sometimes more than one file system is mounted on the same mount point,
 one hiding the other.
   

Right, which is why I preferred the term mount space or mount
namespace. The keys in the map are not as important as the presence of
the independent map itself.

Unadorned namespaces is currently how they are known, and short of
becoming the Hurd I don't think this term is appropriate for Linux.

 In so far as the code managing this naming is concerned, the names are
 usually fairly arbitrary, except that there seems to be a tendency
 toward properly virtualizing these namespaces, presenting to a task
 the namespaces assigned it as if that was all there was, hiding the
 presence of alternative namespaces, and intentionally not providing a
 'global view' that encompasses all namespaces of a given class.

 This tendency culminates in the full blown virtual machines, such as
 Xen and KVM, which virtualize more or less all namespaces.
   

Yes, these systems, somewhat akin to microkernels, virtualize all
namespaces as a byproduct of their nature.

 Because the essential semantics relating one namespace to another are
 rather weak (the namespaces for any given class of things are or can
 be pretty much independent of each other), there is a preference and
 a tradition to keep such sets of namespaces a simple flat space.
   

This has been the practice to date with most worked implementations,
with the proviso that as the feature becomes standard people may start
expecting spaces within spaces to work indistinguishably from top level
spaces; in fact, perhaps there should be no such distinction between a
top space and a subservient space, other than the higher level space
is aware of the subservient space.

Consider, for instance, that BIND already uses Linux kernel features
which are normally only attributed to the top space, such as adjusting
ulimits and unsetting capability bits. This kind of application
self-containment may become more commonplace.

And this perhaps begs the question: is it worth the performance penalty,
or must there be one?

 Conclusions regarding namespaces, aka spaces:

 A namespace provide a set of convenient handles for things of a
 particular class.

 For each class of things, every task gets one namespace (perhaps
 a Null or Default one.)
   

Conceptually, every task exists in exactly one space of each type,
though that space may see and/or administer other spaces.

 Namespaces are partial virtualizations, the 'space of namespaces'
 is pretty flat, and the assignment of names in one namespace is
 pretty independent of the next.

 ===

 That much covers what I understand (perhaps in error) of namespaces.

 So what's this resource accounting/limits stuff?

 I think this depends on adding one more category to our universe.

 For the purposes of introducing yet more terms, I will call this
 new category a metered class.
   

The term metered implies a resource which renews over time. How does
this apply to a fixed limit? A limit's nominal unit may not be delimited
in terms of time, but it must be continually maintained, so it can be
metered in terms of use of that limit over time.

For instance, a single system in scheduling terms is limited to the use
of the number of CPUs present in the system. So, while it has a limit
of 2 CPUs, in terms of a metered resource, it has a maximum rate of 2
CPU seconds per second.

 Each time we set about to manage some resource, we tend to construct
 some more elaborate metered classes out of the elemental 

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Menage

On 3/11/07, Paul Jackson [EMAIL PROTECTED] wrote:


My current understanding of Paul Menage's container patch is that it is
a useful improvement for some of the metered classes - those that could
make good use of a file system like hierarchy for their interface.
It probably doesn't benefit all metered classes, as they won't all
benefit from a file system like hierarchy, or even have a formal name
space, and it doesn't seem to benefit the name space implementation,
which happily remains flat.


Well, what I was aiming at was a generic mechanism that can handle
namespaces, metered classes and other ways of providing
per-task-group behaviour. So a system-call API doesn't necessarily
have the right flexibility to implement the possible different kinds
of subsystems I envisage.

For example, one way to easily tie groups of processes to different
network queues is to have a tag associated with a container, allow
that to propagate to the socket/skbuf priority field, and then use
standard Linux traffic control to pick the appropriate outgoing queue
based on the skbuf's tag.

This isn't really a namespace, and it isn't really a metered class.
It's just a way of associating a piece of data (the network tag) with
a group of processes.

With a filesystem-based interface, it's easy to have a file as the
method of reading/writing the tag; with a system call interface, then
either the interface is sufficiently generic to allow this kind of
data association (in which case you're sort of implementing a
filesystem in the system call) or else you have to shoehorn into an
unrelated API (e.g. if your system call talks about resource limits
you might end up having to specify the network tag as a maximum
limit since there's no other useful configuration data available).

As another example, I'd like to have a subsystem that shows me all the
sockets that processes in the container have opened; again, easy to do
in a filesystem interface, but hard to fit into a
resource-meteting-centric or namespace-centric system call API.

Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 02:06:03PM -0800, Paul Jackson wrote:
   if you create a 'resource container' to limit the
   usage of a set of resources for the processes
   belonging to this container, it would be kind of
   defeating the purpose, if you'd allow the processes
   to manipulate their limits, no?
 
 Wrong - this is not the only way.
 
 For instance in cpusets, -any- task in the system, regardless of what
 cpuset it is currently assigned to, might be able to manipulate -any-
 cpuset in the system.
 
 Yes -- some sufficient mechanism is required to keep tasks from
 escalating their resources or capabilities beyond an allowed point.
 
 But that mechanism might not be strictly based on position in some
 hierarchy.
 
 In the case of cpusets, it is based on the permissions on files in
 the cpuset file system (normally mounted at /dev/cpuset), versus
 the current priviledges and capabilities of the task.
 
 A root priviledged task in the smallest leaf node cpuset can manipulate
 every cpuset in the system.  This is an ordinary and common occurrence.

This assumes that you can see the global vfs namespace right?

What if you are inside a container/vserver which restricts your vfs
namespace? i.e /dev/cpusets seen from one container is not same as what
is seen from another container ..Is that a unrealistic scenario? IMHO
not so. This in-fact lets vservers and containers to work with each
other. So:

/dev/cpuset
|- C1   - Container A bound to this
|  |- C11
|  |- C12
|
|- C2   - Container B bound to this
   |- C21
   |- C22


C1 and C2 are two exclusive cpusets and containers/vservers A and B are bound 
to C1/C2 respectively.

From inside container/vserver A, if you were to look at /dev/cpuset, it will
-appear- as if you are in the top cpuset (with just C11 and C12 child
cpusets). It cannot modify C2 at all (since it has no visibility).

Similarly if you were to look at /dev/cpuset from inside B, it will list only 
C21/C22 with tasks in container B not being able to see C1 at all.

:)


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Wed, Mar 07, 2007 at 03:59:19PM -0600, Serge E. Hallyn wrote:
  containers patches uses just a single pointer in the task_struct, and
  all tasks in the same set of containers (across all hierarchies) will
  share a single container_group object, which holds the actual pointers
  to container state.
 
 Yes, that's why this consolidation doesn't make sense to me.
 
 Especially considering again that we will now have nsproxies pointing to
 containers pointing to... nsproxies.

nsproxies needn't point to containers. It (or as Herbert pointed -
nsproxy-pid_ns) can have direct pointers to resource objects (whatever
struct container-subsys[] points to).


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
  3. This next leads me to think that 'tasks' file in each directory doesnt 
  make
 sense for containers. In fact it can lend itself to error situations (by
 administrator/script mistake) when some tasks of a container are in one
 resource class while others are in a different class.
 
  Instead, from a containers pov, it may be usefull to write
  a 'container id' (if such a thing exists) into the tasks file
  which will move all the tasks of the container into
  the new resource class. This is the same requirement we
  discussed long back of moving all threads of a process into new
  resource class.
 
 I think you need to give a more concrete example and use case of what
 you're trying to propose here. I don't really see what advantage
 you're getting.

Ok, this is what I had in mind:


mount -t container -o ns /dev/namespace
mount -t container -o cpu /dev/cpu

Lets we have the namespaces/resource-groups created as under:

/dev/namespace
|-- prof
||- tasks - (T1, T2)
||- container_id - 1 (doesnt exist today perhaps)
|
|-- student
||- tasks - (T3, T4)
||- container_id - 2 (doesnt exist today perhaps)

/dev/cpu
   |-- prof
   ||-- tasks
   ||-- cpu_limit (40%)
   |
   |-- student
   ||-- tasks
   ||-- cpu_limit (20%)
   |
   |


Is it possible to create the above structure in container patches? 
/me thinks so.

If so, then accidentally someone can do this:

echo T1  /dev/cpu/prof/tasks
echo T2  /dev/cpu/student/tasks

with the result that tasks of the same container are now in different
resource classes.

Thats why in case of containers I felt we shldnt allow individual tasks
to be cat'ed to tasks file. 

Or rather, it may be nice to say :

echo cid 2  /dev/cpu/prof/tasks 

and have all tasks belonging to container id 2 move to the new resource
group.



-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Mon, Mar 12, 2007 at 07:31:48PM +0530, Srivatsa Vaddagiri wrote:
 not so. This in-fact lets vservers and containers to work with each
 other. So:

s/containers/cpusets

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Serge E. Hallyn
Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
 On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
   3. This next leads me to think that 'tasks' file in each directory doesnt 
   make
  sense for containers. In fact it can lend itself to error situations 
   (by
  administrator/script mistake) when some tasks of a container are in one
  resource class while others are in a different class.
  
   Instead, from a containers pov, it may be usefull to write
   a 'container id' (if such a thing exists) into the tasks file
   which will move all the tasks of the container into
   the new resource class. This is the same requirement we
   discussed long back of moving all threads of a process into new
   resource class.
  
  I think you need to give a more concrete example and use case of what
  you're trying to propose here. I don't really see what advantage
  you're getting.
 
 Ok, this is what I had in mind:
 
 
   mount -t container -o ns /dev/namespace
   mount -t container -o cpu /dev/cpu
 
 Lets we have the namespaces/resource-groups created as under:
 
   /dev/namespace
   |-- prof
   ||- tasks - (T1, T2)
   ||- container_id - 1 (doesnt exist today perhaps)
   |
   |-- student
   ||- tasks - (T3, T4)
   ||- container_id - 2 (doesnt exist today perhaps)
 
   /dev/cpu
  |-- prof
  ||-- tasks
  ||-- cpu_limit (40%)
  |
  |-- student
  ||-- tasks
  ||-- cpu_limit (20%)
  |
  |
 
 
 Is it possible to create the above structure in container patches? 
 /me thinks so.
 
 If so, then accidentally someone can do this:
 
   echo T1  /dev/cpu/prof/tasks
   echo T2  /dev/cpu/student/tasks
 
 with the result that tasks of the same container are now in different
 resource classes.

What's wrong with that?

 Thats why in case of containers I felt we shldnt allow individual tasks
 to be cat'ed to tasks file. 
 
 Or rather, it may be nice to say :
 
   echo cid 2  /dev/cpu/prof/tasks 
 
 and have all tasks belonging to container id 2 move to the new resource
 group.

Adding that feature sounds fine, but don't go stopping me from putting
T1 into /dev/cpu/prof/tasks and T2 into /dev/cpu/student/tasks just
because you have your own notion of what each task is supposed to be.
Just because they're in the same namespaces doesn't mean they should get
the same resource allocations.  If you want to add that kind of policy,
well, it should be policy - user-definable.

-serge
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
 What's wrong with that?

I had been asking around on what is the fundamental unit of res mgmt
for vservers and the answer I got (from Herbert) was all tasks that are
in the same pid namespace. From what you are saying above, it seems to
be that there is no such fundamental unit. It can be a random mixture
of tasks (taken across vservers) whose resource consumption needs to be
controlled. Is that correct?

  echo cid 2  /dev/cpu/prof/tasks 
 
 Adding that feature sounds fine, 

Ok yes ..that can be a optional feature.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Serge E. Hallyn
Quoting Srivatsa Vaddagiri ([EMAIL PROTECTED]):
 On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
  What's wrong with that?
 
 I had been asking around on what is the fundamental unit of res mgmt
 for vservers and the answer I got (from Herbert) was all tasks that are
 in the same pid namespace. From what you are saying above, it seems to
 be that there is no such fundamental unit. It can be a random mixture
 of tasks (taken across vservers) whose resource consumption needs to be
 controlled. Is that correct?

If I'm reading it right, yes.

If for vservers the fundamental unit of res mgmt is a vserver, that can
surely be done at a higher level than in the kernel.

Actually, these could be tied just by doing

mount -t container -o ns,cpuset /containers

So now any task in /containers/vserver1 or any subdirectory thereof
would have the same cpuset constraints as /containers.  OTOH, you could
mount them separately

mount -t container -o ns /nsproxy
mount -t container -o cpuset /cpuset

and now you have the freedom to split tasks in the same vserver
(under /nsproxy/vserver1) into different cpusets.

-serge

 echo cid 2  /dev/cpu/prof/tasks 
  
  Adding that feature sounds fine, 
 
 Ok yes ..that can be a optional feature.
 
 -- 
 Regards,
 vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Jackson
vatsa wrote:
 This assumes that you can see the global vfs namespace right?
 
 What if you are inside a container/vserver which restricts your vfs
 namespace? i.e /dev/cpusets seen from one container is not same as what
 is seen from another container .

Well, yes.  But that restriction on the namespace is no doing of
cpusets.

It's some vfs namespace restriction, which should be an orthogonal
mechanism.

Well, it's probably not orthogonal at present.  Cpusets might not yet
handle a restricted vfs name space very well.

For example the /proc/pid/cpuset path, giving path below /dev/cpuset
of task pid's cpuset, might not be restricted.  And the set of all CPUs
and Memory Nodes that are online, which is visible in various /proc
files, and also visible in ones top cpuset, might be inconsistent if
restricted vfs namespace mapped you to a different top cpuset.

There are probably other loose ends as well.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Sam Vilain
Srivatsa Vaddagiri wrote:
 On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
   
 What's wrong with that?
 

 I had been asking around on what is the fundamental unit of res mgmt
 for vservers and the answer I got (from Herbert) was all tasks that are
 in the same pid namespace. From what you are saying above, it seems to
 be that there is no such fundamental unit. It can be a random mixture
 of tasks (taken across vservers) whose resource consumption needs to be
 controlled. Is that correct?
   

Sure, for instance, all postgres processes across all servers might be
put in a different IO and buffercache use container by the system
administrator.

Sam.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Herbert Poetzl
On Mon, Mar 12, 2007 at 03:00:25AM -0700, Paul Menage wrote:
 On 3/11/07, Paul Jackson [EMAIL PROTECTED] wrote:
 
  My current understanding of Paul Menage's container patch is that it is
  a useful improvement for some of the metered classes - those that could
  make good use of a file system like hierarchy for their interface.
  It probably doesn't benefit all metered classes, as they won't all
  benefit from a file system like hierarchy, or even have a formal name
  space, and it doesn't seem to benefit the name space implementation,
  which happily remains flat.
 
 Well, what I was aiming at was a generic mechanism that can handle
 namespaces, metered classes and other ways of providing
 per-task-group behaviour. So a system-call API doesn't necessarily
 have the right flexibility to implement the possible different kinds
 of subsystems I envisage.
 
 For example, one way to easily tie groups of processes to different
 network queues is to have a tag associated with a container, allow
 that to propagate to the socket/skbuf priority field, and then use
 standard Linux traffic control to pick the appropriate outgoing queue
 based on the skbuf's tag.
 
 This isn't really a namespace, and it isn't really a metered class.
 It's just a way of associating a piece of data (the network tag) with
 a group of processes.
 
 With a filesystem-based interface, it's easy to have a file as the
 method of reading/writing the tag; with a system call interface, then
 either the interface is sufficiently generic to allow this kind of
 data association (in which case you're sort of implementing a
 filesystem in the system call) or else you have to shoehorn into an
 unrelated API (e.g. if your system call talks about resource limits
 you might end up having to specify the network tag as a maximum
 limit since there's no other useful configuration data available).
 
 As another example, I'd like to have a subsystem that shows me all the
 sockets that processes in the container have opened; again, easy to do
 in a filesystem interface, but hard to fit into a
 resource-meteting-centric or namespace-centric system call API.

why? you simply enter that specific space and
use the existing mechanisms (netlink, proc, whatever)
to retrieve the information with _existing_ tools,
no need to do anything unusual via the syscall API

and if you support a 'spectator' context or capability
(which allows to see the _whole_ truth) you can also
get this information for _all_ sockets with existing
tools like netstat or lsof ...

best,
Herbert

 Paul
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Herbert Poetzl
On Mon, Mar 12, 2007 at 09:50:45PM +0530, Srivatsa Vaddagiri wrote:
 On Mon, Mar 12, 2007 at 10:56:43AM -0500, Serge E. Hallyn wrote:
  What's wrong with that?
 
 I had been asking around on what is the fundamental unit of res mgmt
 for vservers and the answer I got (from Herbert) was all tasks that are
 in the same pid namespace. From what you are saying above, it seems to
 be that there is no such fundamental unit. It can be a random mixture
 of tasks (taken across vservers) whose resource consumption needs to be
 controlled. Is that correct?

just means that the current Linux-VServer behaviour
is a subset of that, no problem there as long as
it really _is_ a subset :) we always like to provide
more features in the future, no problem with that :)

best,
Herbert

 echo cid 2  /dev/cpu/prof/tasks 
  
  Adding that feature sounds fine, 
 
 Ok yes ..that can be a optional feature.
 
 -- 
 Regards,
 vatsa
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Srivatsa Vaddagiri
On Tue, Mar 13, 2007 at 12:31:13AM +0100, Herbert Poetzl wrote:
 just means that the current Linux-VServer behaviour
 is a subset of that, no problem there as long as
 it really _is_ a subset :) we always like to provide
 more features in the future, no problem with that :)

Considering the example Sam quoted, doesn't it make sense to split
resource classes (some of them atleast) independent of each other?
That would also argue for providing multiple hierarchy feature in Paul's
patches.

Given that and the mail Serge sent on why nsproxy optimization is
usefull given numbers, can you reconsider your earlier proposals as
below:

- pid_ns and resource parameters should be in a single struct
  (point 1c, 7b in [1])

- pointers to resource controlling objects should be inserted
  in task_struct directly (instead of nsproxy indirection)
  (points 2c in [1])

[1] http://lkml.org/lkml/2007/3/12/138

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-12 Thread Paul Menage

On 3/12/07, Herbert Poetzl [EMAIL PROTECTED] wrote:


why? you simply enter that specific space and
use the existing mechanisms (netlink, proc, whatever)
to retrieve the information with _existing_ tools,


That's assuming that you're using network namespace virtualization,
with each group of tasks in a separate namespace. What if you don't
want the virtualization overhead, just the accounting?

Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-11 Thread Paul Jackson
Sam, responding to Herbert:
> > from my personal PoV the following would be fine:
> >
> >  spaces (for the various 'spaces')
> >...
> >  container (for resource accounting/limits)
> >...
> 
> I like these a lot ...

Hmmm ... ok ...

Let me see if I understand this.

We have actors, known as threads, tasks or processes, which use things,
which are instances of such classes of things as disk partitions,
file systems, memory, cpus, and semaphores.

We assign names to these things, such as SysV id's to the semaphores,
mount points to the file systems, pathnames to files and file
descriptors to open files.  These names provide handles that
are typically more convenient and efficient to use, but alas less
persistent, less ubiquitous, and needing of some dereferencing when
used, to identify the underlying thing.

Any particular assignment of names to some of the things in particular
class forms one namespace (aka 'space', above).  For each class of
things, a given task is assigned one such namespace.  Typically many
related tasks (such as all those of a login session or a job) will be
assigned the same set of namespaces, leading to various opportunities
for optimizing the management of namespaces in the kernel.

This assignment of names to things is neither injective nor surjective
nor even a complete map.

For example, not all file systems are mounted, certainly not all
possible mount points (all directories) serve as mount points,
sometimes the same file system is mounted in multiple places, and
sometimes more than one file system is mounted on the same mount point,
one hiding the other.

In so far as the code managing this naming is concerned, the names are
usually fairly arbitrary, except that there seems to be a tendency
toward properly virtualizing these namespaces, presenting to a task
the namespaces assigned it as if that was all there was, hiding the
presence of alternative namespaces, and intentionally not providing a
'global view' that encompasses all namespaces of a given class.

This tendency culminates in the full blown virtual machines, such as
Xen and KVM, which virtualize more or less all namespaces.

Because the essential semantics relating one namespace to another are
rather weak (the namespaces for any given class of things are or can
be pretty much independent of each other), there is a preference and
a tradition to keep such sets of namespaces a simple flat space.

Conclusions regarding namespaces, aka spaces:

A namespace provide a set of convenient handles for things of a
particular class.

For each class of things, every task gets one namespace (perhaps
a Null or Default one.)

Namespaces are partial virtualizations, the 'space of namespaces'
is pretty flat, and the assignment of names in one namespace is
pretty independent of the next.

===

That much covers what I understand (perhaps in error) of namespaces.

So what's this resource accounting/limits stuff?

I think this depends on adding one more category to our universe.

For the purposes of introducing yet more terms, I will call this
new category a "metered class."

Each time we set about to manage some resource, we tend to construct
some more elaborate "metered classes" out of the elemental classes
of things (partitions, cpus, ...) listed above.

Examples of these more elaborate metered classes include percentages
of a networks bandwidth, fractions of a nodes memory (the fake numa
patch), subsets of the systems cpus and nodes (cpusets), ...

These more elaborate metered classes each have fairly 'interesting'
and specialized forms.  Their semantics are closely adapted to the
underlying class of things from which they are formed, and to the
usually challenging, often conflicting, constraints on managing the
usage of such a resource.

For example, the rules that apply to percentages of a networks
bandwidth have little in common with the rules that apply to sets of
subsets of a systems cpus and nodes.

We then attach tasks to these metered classes.  Each task is assigned
one metered instance from each metered class.  For example, each task
is assigned to a cpuset.

For metered classes that are visible across the system, we tend
to name these classes, and then use those names when attaching
tasks to them.  See for example cpusets.

For metered classes that are only privately visible within the
current context of a task, such as setrlimit, set_mempolicy,
mbind and set_mempolicy, we tend to implicitly attach each task
to its current metered class and provide it explicit means
to manipulate the individual attributes of that metered class
by direct system calls.

Conclusions regarding metered classes, aka containers:

Unlike namespaces, metered classes have rich and varied semantics,
sometimes elaborate inheritance and transfer rules, and frequently
non-flat topologies.

Depending on the scope of visibility of a metered class, it may
or may not have much of a formal name space.


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-11 Thread Paul Jackson
Sam, responding to Herbert:
  from my personal PoV the following would be fine:
 
   spaces (for the various 'spaces')
 ...
   container (for resource accounting/limits)
 ...
 
 I like these a lot ...

Hmmm ... ok ...

Let me see if I understand this.

We have actors, known as threads, tasks or processes, which use things,
which are instances of such classes of things as disk partitions,
file systems, memory, cpus, and semaphores.

We assign names to these things, such as SysV id's to the semaphores,
mount points to the file systems, pathnames to files and file
descriptors to open files.  These names provide handles that
are typically more convenient and efficient to use, but alas less
persistent, less ubiquitous, and needing of some dereferencing when
used, to identify the underlying thing.

Any particular assignment of names to some of the things in particular
class forms one namespace (aka 'space', above).  For each class of
things, a given task is assigned one such namespace.  Typically many
related tasks (such as all those of a login session or a job) will be
assigned the same set of namespaces, leading to various opportunities
for optimizing the management of namespaces in the kernel.

This assignment of names to things is neither injective nor surjective
nor even a complete map.

For example, not all file systems are mounted, certainly not all
possible mount points (all directories) serve as mount points,
sometimes the same file system is mounted in multiple places, and
sometimes more than one file system is mounted on the same mount point,
one hiding the other.

In so far as the code managing this naming is concerned, the names are
usually fairly arbitrary, except that there seems to be a tendency
toward properly virtualizing these namespaces, presenting to a task
the namespaces assigned it as if that was all there was, hiding the
presence of alternative namespaces, and intentionally not providing a
'global view' that encompasses all namespaces of a given class.

This tendency culminates in the full blown virtual machines, such as
Xen and KVM, which virtualize more or less all namespaces.

Because the essential semantics relating one namespace to another are
rather weak (the namespaces for any given class of things are or can
be pretty much independent of each other), there is a preference and
a tradition to keep such sets of namespaces a simple flat space.

Conclusions regarding namespaces, aka spaces:

A namespace provide a set of convenient handles for things of a
particular class.

For each class of things, every task gets one namespace (perhaps
a Null or Default one.)

Namespaces are partial virtualizations, the 'space of namespaces'
is pretty flat, and the assignment of names in one namespace is
pretty independent of the next.

===

That much covers what I understand (perhaps in error) of namespaces.

So what's this resource accounting/limits stuff?

I think this depends on adding one more category to our universe.

For the purposes of introducing yet more terms, I will call this
new category a metered class.

Each time we set about to manage some resource, we tend to construct
some more elaborate metered classes out of the elemental classes
of things (partitions, cpus, ...) listed above.

Examples of these more elaborate metered classes include percentages
of a networks bandwidth, fractions of a nodes memory (the fake numa
patch), subsets of the systems cpus and nodes (cpusets), ...

These more elaborate metered classes each have fairly 'interesting'
and specialized forms.  Their semantics are closely adapted to the
underlying class of things from which they are formed, and to the
usually challenging, often conflicting, constraints on managing the
usage of such a resource.

For example, the rules that apply to percentages of a networks
bandwidth have little in common with the rules that apply to sets of
subsets of a systems cpus and nodes.

We then attach tasks to these metered classes.  Each task is assigned
one metered instance from each metered class.  For example, each task
is assigned to a cpuset.

For metered classes that are visible across the system, we tend
to name these classes, and then use those names when attaching
tasks to them.  See for example cpusets.

For metered classes that are only privately visible within the
current context of a task, such as setrlimit, set_mempolicy,
mbind and set_mempolicy, we tend to implicitly attach each task
to its current metered class and provide it explicit means
to manipulate the individual attributes of that metered class
by direct system calls.

Conclusions regarding metered classes, aka containers:

Unlike namespaces, metered classes have rich and varied semantics,
sometimes elaborate inheritance and transfer rules, and frequently
non-flat topologies.

Depending on the scope of visibility of a metered class, it may
or may not have much of a formal name space.

===

My current 

Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-10 Thread Paul Jackson
Sam wrote:
> But when you apply this to something like cpusets, it gets a little abstract.

Just a tad abstract .

Thanks.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-10 Thread Sam Vilain
Herbert Poetzl wrote:
> On Wed, Mar 07, 2007 at 11:44:58PM -0700, Eric W. Biederman wrote:
>   
>> I really don't much care as long as we don't start redefining
>> container as something else.  I think the IBM guys took it from
>> solaris originally which seems to define a zone as a set of
>> isolated processes (for us all separate namespaces).  And a container
>> as a set of as a zone that uses resource control.  Not exactly how
>> we have been using the term but close enough not to confuse someone.
>>
>> As long as we don't go calling the individual subsystems or the
>> process groups they need to function a container I really don't care.
>> [...]
>> Resource groups at least for subset of subsystems that aren't
>> namespaces sounds reasonable.  Heck resource group, resource
>> controller, resource subsystem, resource just about anything seems
>> sane to me.
>>
>> The important part is that we find a vocabulary without doubly
>> defined words so we can communicate and a small common set we can
>> agree on so people can work on and implement the individual
>> resource controllers/groups, and get the individual pieces merged
>> as they are reading.
>> 
>
> from my personal PoV the following would be fine:
>
>  spaces (for the various 'spaces')
>
>   - similar enough to the old namespace
>   - can be easily used with prefix/postfix
> like in pid_space, mnt_space, uts_space etc
>   - AFAIK, it is not used yet for anything else
>
>  container (for resource accounting/limits)
>
>   - has the 'containment' principle built in :)
>   - is used in similar ways in other solutions
>   - sounds similar to context (easy to associate)
>
> note: I'm also fine with other names, as long as
> we find some useable vocabulary soon, [...]

I like these a lot, particularly in that "mount space" could be a
reasonable replacement for "namespace".

As a result of this discussion, I see the sense in Paul Menage's
original choice of term.

There's just one problem.  We'd have to rename the mailing list to
"spaces and containers" :-)

Sam.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-10 Thread Sam Vilain
Paul Jackson wrote:
>> But "namespace" has well-established historical semantics too - a way
>> of changing the mappings of local * to global objects. This
>> accurately describes things liek resource controllers, cpusets, resource
>> monitoring, etc.
>> 
>
> No!
>
> Cpusets don't rename or change the mapping of objects.
>
> I suspect you seriously misunderstand cpusets and are trying to cram them
> into a 'namespace' remapping role into which they don't fit.
>   

Look, you're absolutely right, I'm stretching the terms much too far.

namespaces implies some kind of domain, which is the namespace, and
entities within the domain, which are the names, and there is a (task,
domain) mapping. I was thinking that this implies all similar (task,
domain) mappings could be treated in the same way. But when you apply
this to something like cpusets, it gets a little abstract. Like the
entities are (task,cpu) pairs and the domains the set of cpus that a
process can run on.

Sam.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-10 Thread Sam Vilain
Paul Jackson wrote:
 But namespace has well-established historical semantics too - a way
 of changing the mappings of local * to global objects. This
 accurately describes things liek resource controllers, cpusets, resource
 monitoring, etc.
 

 No!

 Cpusets don't rename or change the mapping of objects.

 I suspect you seriously misunderstand cpusets and are trying to cram them
 into a 'namespace' remapping role into which they don't fit.
   

Look, you're absolutely right, I'm stretching the terms much too far.

namespaces implies some kind of domain, which is the namespace, and
entities within the domain, which are the names, and there is a (task,
domain) mapping. I was thinking that this implies all similar (task,
domain) mappings could be treated in the same way. But when you apply
this to something like cpusets, it gets a little abstract. Like the
entities are (task,cpu) pairs and the domains the set of cpus that a
process can run on.

Sam.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-10 Thread Sam Vilain
Herbert Poetzl wrote:
 On Wed, Mar 07, 2007 at 11:44:58PM -0700, Eric W. Biederman wrote:
   
 I really don't much care as long as we don't start redefining
 container as something else.  I think the IBM guys took it from
 solaris originally which seems to define a zone as a set of
 isolated processes (for us all separate namespaces).  And a container
 as a set of as a zone that uses resource control.  Not exactly how
 we have been using the term but close enough not to confuse someone.

 As long as we don't go calling the individual subsystems or the
 process groups they need to function a container I really don't care.
 [...]
 Resource groups at least for subset of subsystems that aren't
 namespaces sounds reasonable.  Heck resource group, resource
 controller, resource subsystem, resource just about anything seems
 sane to me.

 The important part is that we find a vocabulary without doubly
 defined words so we can communicate and a small common set we can
 agree on so people can work on and implement the individual
 resource controllers/groups, and get the individual pieces merged
 as they are reading.
 

 from my personal PoV the following would be fine:

  spaces (for the various 'spaces')

   - similar enough to the old namespace
   - can be easily used with prefix/postfix
 like in pid_space, mnt_space, uts_space etc
   - AFAIK, it is not used yet for anything else

  container (for resource accounting/limits)

   - has the 'containment' principle built in :)
   - is used in similar ways in other solutions
   - sounds similar to context (easy to associate)

 note: I'm also fine with other names, as long as
 we find some useable vocabulary soon, [...]

I like these a lot, particularly in that mount space could be a
reasonable replacement for namespace.

As a result of this discussion, I see the sense in Paul Menage's
original choice of term.

There's just one problem.  We'd have to rename the mailing list to
spaces and containers :-)

Sam.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-10 Thread Paul Jackson
Sam wrote:
 But when you apply this to something like cpusets, it gets a little abstract.

Just a tad abstract grin.

Thanks.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Sat, Mar 10, 2007 at 07:32:20AM +0530, Srivatsa Vaddagiri wrote:
> Ok, let me see if I can convey what I had in mind better:
> 
>   uts_ns pid_ns ipc_ns
>   \|/
>   ---
>  | nsproxy|
>   
>  /  |   \\ <-- 'nsproxy' pointer
>   T1  T2  T3 ...T1000
>   |   |   |  | <-- 'containers' pointer (4/8 KB for 1000 task)
>  ---
> | container_group   |
>  --   
>   /
>--
>   | container |
>--
>   |
>--
>   | cpu_limit |
>-- 

[snip]

> We save on 4/8 KB (for 1000 tasks) by avoiding the 'containers' pointer
> in each task_struct (just to get to the resource limit information).

Having the 'containers' pointer in each task-struct is great from a
non-container res mgmt perspective. It lets you dynamically decide what
is the fundamental unit of res mgmt. 

It could be {T1, T5} tasks/threads of a process, or {T1, T3, T8, T10} tasks of 
a session (for limiting login time per session), or {T1, T2 ..T10, T18, T27} 
tasks of a user etc.

But from a vserver/container pov, this level flexibility (at a -task- level) of 
deciding the unit of res mgmt is IMHO not needed. The
vserver/container/namespace (tsk->nsproxy->some_ns) to which a task 
belongs automatically defines that unit of res mgmt.


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Herbert Poetzl
On Sat, Mar 10, 2007 at 12:11:05AM +0530, Srivatsa Vaddagiri wrote:
> On Fri, Mar 09, 2007 at 02:16:08AM +0100, Herbert Poetzl wrote:
> > On Thu, Mar 08, 2007 at 05:00:54PM +0530, Srivatsa Vaddagiri wrote:
> > > On Thu, Mar 08, 2007 at 01:50:01PM +1300, Sam Vilain wrote:
> > > > 7. resource namespaces
> > > 
> > > It should be. Imagine giving 20% bandwidth to a user X. X wants to
> > > divide this bandwidth further between multi-media (10%), kernel
> > > compilation (5%) and rest (5%). So,
> > 
> > sounds quite nice, but ...
> > 
> > > > Is the subservient namespace's resource usage counting against
> > > > ours too?
> > > 
> > > Yes, the resource usage of children should be accounted when capping
> > > parent resource usage.
> > 
> > it will require to do accounting many times
> > (and limit checks of course), which in itself
> > might be a way to DoS the kernel by creating
> > more and more resource groups
> 
> I was only pointing out the usefullness of the feature and not
> necessarily saying it -should- be implemented! Ofcourse I understand it
> will make the controller complicated and thats why probably none of the
> recontrollers we are seeing posted on lkml don't support hierarchical
> res mgmt.
> 
> > > > Can we dynamically alter the subservient namespace's resource
> > > > allocations?
> > > 
> > > Should be possible yes. That lets user X completely manage his
> > > allocation among whatever sub-groups he creates.
> > 
> > what happens if the parent changes, how is
> > the resource change (if it was a reduction)
> > propagated to the children?
> >
> > e.g. your guest has 1024 file handles, now
> > you reduce it to 512, but the guest had two
> > children, both with 256 file handles each ...
> 
> I believe CKRM handled this quite neatly (by defining child shares to be
> relative to parent shares). 
> 
> In your example, 256+256 add up to 512 which is within the parent's
> new limit, so nothing happens :) 

yes, but that might as well be fatal, because now the
children can easily DoS the parent by using up all the
file handles, where the 'original' setup (2 x 256)
left 512 file handles 'reserved' ...

of course, you could as well have adjusted that to
2 x 128 + 256 for the parent, but that is policy and
IMHO policy does not belong into the kernel, it should
be handled by userspace (maybe invoked by the kernel
in some kind of helper functionality or so)

> You also picked an example of exhaustible/non-reclaimable resource,
> which makes it hard to define what should happen if parent's limit
> goes below 512.

which was quite intentional, and brings us to another
issues when adjusting resource limits (not even in
a hierarchical way)

> Either nothing happens or perhaps a task is killed, don't know.

> In case of memory, I would say that some of child's pages may 
> get kicked out and in case of cpu, child will start getting fewer
> cycles.

btw, kicking out pages when rss limit is reached might
be the obvious choice (if we think Virtual Machine here)
but it might not be the best choice from the overall
performance PoV, which might be much better off by
keeping the page in memory (if there is enough memory
available) but penalizing the guest like the page was
actually kicked out (and needs to be fetched later on)

note: this is something we should think about when we
want to address specific limits like RSS, because IMHO
we should not optimize for the single guest case, but
for the big picture ...

> > > The patches should give visibility to both nsproxy objects (by
> > > showing what tasks share the same nsproxy objects and letting
> > > tasks move across nsproxy objects if allowed) and the resource
> > > control objects pointed to by nsproxy (struct cpuset, struct
> > > cpu_limit, struct rss_limit etc).
> > 
> > the nsproxy is not really relevant, as it
> > is some kind of strange indirection, which
> > does not necessarily depict the real relations,
> > regardless wether you do the re-sharing of
> > those nsproies or not .. 
> 
> So what are you recommending we do instead? 
> My thought was whatever is the fundamental unit to which resource
> management needs to be applied, lets store resource parameters (or
> pointers to them) there (rather than duplicating the information in
> each task_struct).

we do not want to duplicate any information in the task
struct, but we might want to put some (or maybe all)
of the spaces back (as pointer reference) to the task
struct, just to avoid the nsproxy indirection

note that IMHO not all spaces make sense to be separated
e.g. while it is quite useful to have network and pid
space separated, others might be joined to form larger
consistant structures ...

for example, I could as well live with pid and resource
accounting/limits sharing one common struct/space ...
(doesn't mean that separate spaces are not nice :)

best,
Herbert

> -- 
> Regards,
> vatsa
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> 

Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
I think maybe I didnt communicate what I mean by a container here
(although I thought I did). I am referring to a container in a vserver
context (set of tasks which share the same namespace).

On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
> >2. Regarding space savings, if 100 tasks are in a container (I dont know
> >   what is a typical number) -and- lets say that all tasks are to share
> >   the same resource allocation (which seems to be natural), then having
> >   a 'struct container_group *' pointer in each task_struct seems to be not
> >   very efficient (simply because we dont need that task-level granularity 
> >   of
> >   managing resource allocation).
> 
> I think you should re-read my patches.
> 
> Previously, each task had N pointers, one for its container in each
> potential hierarchy. The container_group concept means that each task
> has 1 pointer, to a set of container pointers (one per hierarchy)
> shared by all tasks that have exactly the same set of containers (in
> the various different hierarchies).

Ok, let me see if I can convey what I had in mind better:

uts_ns pid_ns ipc_ns
\|/
---
   | nsproxy|

 /  |   \\ <-- 'nsproxy' pointer
T1  T2  T3 ...T1000
|   |   |  | <-- 'containers' pointer (4/8 KB for 1000 task)
   ---
  | container_group   |
   --   
/
 --
| container |
 --
|
 --
| cpu_limit |
 -- 

(T1, T2, T3 ..T1000) are part of a vserver lets say sharing the same
uts/pid/ipc_ns. Now where do we store the resource control information
for this unit/set-of-tasks in your patches?

(tsk->containers->container[cpu_ctlr.hierarchy] + X)->cpu_limit 

(The X is to account for the fact that cotainer structure points to a
'struct container_subsys_state' embedded in some other structure. Its
usually zero if the structure is embedded at the top)

I understand that container_group also points directly to
'struct container_subsys_state', in which case, the above is optimized
to:

(tsk->containers->subsys[cpu_ctlr.subsys_id] + X)->cpu_limit

Did I get that correct?

Compare that to:

   ---
  | cpu_limit |
uts_ns pid_ns ipc_ns   --
\|/ |

   |nsproxy |

 /  |   \|
T1  T2  T3 .T1000

We save on 4/8 KB (for 1000 tasks) by avoiding the 'containers' pointer
in each task_struct (just to get to the resource limit information).

So my observation was (again note primarily from a vserver context): given that 
(T1, T2, T3 ..T1000) will all need to be managed as a unit (because they are 
all sharing the same nsproxy pointer), then having the '->containers' pointer 
in -each- one of them to tell the unit's limit is not optimal. Instead store 
the limit in the proper unit structure (in this case nsproxy - but
whatever else is more suitable vserver datastructure (pid_ns?) which
represent the fundamental unit of res mgmt in vservers).

(I will respond to remaining comments later ..too early in the morning now!)

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Paul Menage

On 3/9/07, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:


1. What is the fundamental unit over which resource-management is
applied? Individual tasks or individual containers?

/me thinks latter.


Yes


In which case, it makes sense to stick
resource control information in the container somewhere.


Yes, that's what all my patches have been doing.


2. Regarding space savings, if 100 tasks are in a container (I dont know
   what is a typical number) -and- lets say that all tasks are to share
   the same resource allocation (which seems to be natural), then having
   a 'struct container_group *' pointer in each task_struct seems to be not
   very efficient (simply because we dont need that task-level granularity of
   managing resource allocation).


I think you should re-read my patches.

Previously, each task had N pointers, one for its container in each
potential hierarchy. The container_group concept means that each task
has 1 pointer, to a set of container pointers (one per hierarchy)
shared by all tasks that have exactly the same set of containers (in
the various different hierarchies).

It doesn't give task-level granularity of resource management (unless
you create a separate container for each task), it just gives a space
saving.



3. This next leads me to think that 'tasks' file in each directory doesnt make
   sense for containers. In fact it can lend itself to error situations (by
   administrator/script mistake) when some tasks of a container are in one
   resource class while others are in a different class.

Instead, from a containers pov, it may be usefull to write
a 'container id' (if such a thing exists) into the tasks file
which will move all the tasks of the container into
the new resource class. This is the same requirement we
discussed long back of moving all threads of a process into new
resource class.


I think you need to give a more concrete example and use case of what
you're trying to propose here. I don't really see what advantage
you're getting.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Paul Jackson
> the emphasis here is on 'from inside' which basically
> boils down to the following:
> 
>  if you create a 'resource container' to limit the
>  usage of a set of resources for the processes
>  belonging to this container, it would be kind of
>  defeating the purpose, if you'd allow the processes
>  to manipulate their limits, no?

Wrong - this is not the only way.

For instance in cpusets, -any- task in the system, regardless of what
cpuset it is currently assigned to, might be able to manipulate -any-
cpuset in the system.

Yes -- some sufficient mechanism is required to keep tasks from
escalating their resources or capabilities beyond an allowed point.

But that mechanism might not be strictly based on position in some
hierarchy.

In the case of cpusets, it is based on the permissions on files in
the cpuset file system (normally mounted at /dev/cpuset), versus
the current priviledges and capabilities of the task.

A root priviledged task in the smallest leaf node cpuset can manipulate
every cpuset in the system.  This is an ordinary and common occurrence.

I say again, as you seem to be skipping over this detail, one
advantage of basing an API on a file system is the usefulness of
the file system permission model (the -rwxrwxrwx permissions and
the uid/gid owners on each file and directory node).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Herbert Poetzl
On Fri, Mar 09, 2007 at 11:49:08PM +0530, Srivatsa Vaddagiri wrote:
> On Fri, Mar 09, 2007 at 01:53:57AM +0100, Herbert Poetzl wrote:
>>> The real trick is that I believe these groupings are designed to
>>> be something you can setup on login and then not be able to switch
>>> out of. Which means we can't use sessions and process groups as the
>>> grouping entities as those have different semantics.
>> 
>> precisely, once you are inside a resource container, you
>> must not have the ability to modify its limits, and to
>> some degree, you should not know about the actual available
>> resources, but only about the artificial limits

the emphasis here is on 'from inside' which basically
boils down to the following:

 if you create a 'resource container' to limit the
 usage of a set of resources for the processes
 belonging to this container, it would be kind of
 defeating the purpose, if you'd allow the processes
 to manipulate their limits, no?

> From non-container workload management perspective, we do desire
> dynamic manipulation of limits associated with a group and also the
> ability to move tasks across resource-classes/groups.

the above doesn't mean that there aren't processes
_outside_ the resource container which have the
necessary capabilities to manipulate the container
in any way (changing limits dynamically, moving
tasks in and out of the container, etc ...)

best,
Herbert

> -- 
> Regards,
> vatsa
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Paul Jackson
Herbert wrote (and vatsa quoted):
> precisely, once you are inside a resource container, you
> must not have the ability to modify its limits, and to
> some degree, you should not know about the actual available
> resources, but only about the artificial limits

Not necessarily.  Depending on the resource we are managing, and on how
all encompassing one chooses to make the virtualization, this might be
an overly Draconian permission model.

Certainly in cpusets, any task can see the the whole system view, if
there are fairly typical permissions on some /dev/cpuset files.  A task
might even be able to change those limits for any task on the system,
if it has stronger priviledge.

Whether or not to really virtualize something, so that the contained
task can't tell it is in side a cacoon, is a somewhat separate question
from whether or not to limit a tasks use of a particular precious
resource.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 02:16:08AM +0100, Herbert Poetzl wrote:
> On Thu, Mar 08, 2007 at 05:00:54PM +0530, Srivatsa Vaddagiri wrote:
> > On Thu, Mar 08, 2007 at 01:50:01PM +1300, Sam Vilain wrote:
> > > 7. resource namespaces
> > 
> > It should be. Imagine giving 20% bandwidth to a user X. X wants to
> > divide this bandwidth further between multi-media (10%), kernel
> > compilation (5%) and rest (5%). So,
> 
> sounds quite nice, but ...
> 
> > > Is the subservient namespace's resource usage counting against ours too?
> > 
> > Yes, the resource usage of children should be accounted when capping
> > parent resource usage.
> 
> it will require to do accounting many times
> (and limit checks of course), which in itself
> might be a way to DoS the kernel by creating
> more and more resource groups

I was only pointing out the usefullness of the feature and not
necessarily saying it -should- be implemented! Ofcourse I understand it
will make the controller complicated and thats why probably none of the
recontrollers we are seeing posted on lkml don't support hierarchical
res mgmt.

> > > Can we dynamically alter the subservient namespace's resource
> > > allocations?
> > 
> > Should be possible yes. That lets user X completely manage his
> > allocation among whatever sub-groups he creates.
> 
> what happens if the parent changes, how is
> the resource change (if it was a reduction)
> propagated to the children?
>
> e.g. your guest has 1024 file handles, now
> you reduce it to 512, but the guest had two
> children, both with 256 file handles each ...

I believe CKRM handled this quite neatly (by defining child shares to be
relative to parent shares). 

In your example, 256+256 add up to 512 which is within the parent's new limit, 
so nothing happens :) You also picked an example of exhaustible/non-reclaimable 
resource, which makes it hard to define what should happen if parent's limit 
goes below 512. Either nothing happens or perhaps a task is killed,
don't know. In case of memory, I would say that some of child's pages may 
get kicked out and in case of cpu, child will start getting fewer
cycles.

> > The patches should give visibility to both nsproxy objects (by showing
> > what tasks share the same nsproxy objects and letting tasks move across
> > nsproxy objects if allowed) and the resource control objects pointed to
> > by nsproxy (struct cpuset, struct cpu_limit, struct rss_limit etc).
> 
> the nsproxy is not really relevant, as it
> is some kind of strange indirection, which
> does not necessarily depict the real relations,
> regardless wether you do the re-sharing of
> those nsproies or not .. 

So what are you recommending we do instead? My thought was whatever is
the fundamental unit to which resource management needs to be applied,
lets store resource parameters (or pointers to them) there (rather than 
duplicating the information in each task_struct).

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 01:53:57AM +0100, Herbert Poetzl wrote:
> > The real trick is that I believe these groupings are designed to
> > be something you can setup on login and then not be able to switch
> > out of. Which means we can't use sessions and process groups as the
> > grouping entities as those have different semantics.
> 
> precisely, once you are inside a resource container, you
> must not have the ability to modify its limits, and to
> some degree, you should not know about the actual available
> resources, but only about the artificial limits

>From non-container workload management perspective, we do desire dynamic
manipulation of limits associated with a group and also the ability to move 
tasks across resource-classes/groups.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Serge E. Hallyn
Quoting Paul Menage ([EMAIL PROTECTED]):
> On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:
> >
> >Ok, they share this characteristic with namespaces: that they group
> >processes.

Namespaces have a side effect of grouping processes, but a namespace is
not defined by 'grouping proceses.'  A container is, in fact, a group of
processes.

> > So, they conceptually hang off task_struct.  But we put them
> >on ns_proxy because we've got this vague notion that things might be
> >better that way.
> 
> Remember that I'm not the one pushing to move them into ns_proxy.
> These patches are all Srivatsa's work. Despite that fact that they say
> "Signed-off-by: Paul Menage", I'd never seen them before they were
> posted to LKML, and I'm not sure that they're the right approach.
> (Although some form of unification might be good).

The nsproxy container subsystem could be said to be that unification.
If we really wanted to I suppose we could now always mount the nsproxy
subsystem, get rid of tsk->nsproxy, and always get thta through it's
nsproxy subsystem container.  But then that causes trouble with being
able to mount a hierarachy like

mount -t container -o ns,cpuset

so we'd have to fix something.  It also slows things down...

> >>> about this you still insist on calling this sub-system specific stuff
> >>> the "container",
> >>>
> >> Uh, no. I'm trying to call a *grouping* of processes a container.
> >>
> >
> >Ok, so is this going to supplant the namespaces too?
> 
> I don't know. It would be nice to have a single object hanging off the
> task struct that contains all the various grouping pointers. Having

The namespaces aren't grouping pointers, they are resource id tables.

I stand by my earlier observation that placing namespace pointers and
grouping pointers in the same structure means that pointer will end up
pointing to itself.

> something that was flexible enough to handle all the required
> behaviours, or else allowing completely different behaviours for
> different subsets of that structure, could be the fiddly bit.
> 
> See my expanded reply to Eric' earlier post for a possible way of
> unifying them, and simplifying the nsproxy and container.c code in the
> process.

Doesn't ring a bell, I'll have to look around for that...

> >
> >  - resource groups (I get a strange feeling of d?j? v? there)
> 
> Resource groups isn't a terrible name for them (although I'd be

I still like 'rug' for resource usage groups :)

-serge
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 10:04:30PM +0530, Srivatsa Vaddagiri wrote:
> 2. Regarding space savings, if 100 tasks are in a container (I dont know
>what is a typical number) -and- lets say that all tasks are to share
>the same resource allocation (which seems to be natural), then having
>a 'struct container_group *' pointer in each task_struct seems to be not 
>very efficient (simply because we dont need that task-level granularity of
>managing resource allocation).

Note that this 'struct container_group *' pointer is in addition to the
'struct nsproxy *' pointer already in task_struct. If the set of tasks
over which resorce control is applied is typically the same set of tasks
which share the same 'struct nsproxy *' pointer, then IMHO 'struct
container_group *' in each task_struct is not very optimal.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Wed, Mar 07, 2007 at 01:20:18PM -0800, Paul Menage wrote:
> On 3/7/07, Serge E. Hallyn <[EMAIL PROTECTED]> wrote:
> >
> >All that being said, if it were going to save space without overly
> >complicating things I'm actually not opposed to using nsproxy, but it
> 
> If space-saving is the main issue, then the latest version of my
> containers patches uses just a single pointer in the task_struct, and
> all tasks in the same set of containers (across all hierarchies) will
> share a single container_group object, which holds the actual pointers
> to container state.

Paul,
Some more thoughts, mostly coming from the point of view of
vservers/containers/"whaever is the set of tasks sharing a nsproxy is
called".

1. What is the fundamental unit over which resource-management is
applied? Individual tasks or individual containers?

/me thinks latter. In which case, it makes sense to stick 
resource control information in the container somewhere.
Just like when controlling a user's resource consumption, 
'struct user_struct' may be a natural place to put these resource 
limits.

2. Regarding space savings, if 100 tasks are in a container (I dont know
   what is a typical number) -and- lets say that all tasks are to share
   the same resource allocation (which seems to be natural), then having
   a 'struct container_group *' pointer in each task_struct seems to be not 
   very efficient (simply because we dont need that task-level granularity of
   managing resource allocation).

3. This next leads me to think that 'tasks' file in each directory doesnt make 
   sense for containers. In fact it can lend itself to error situations (by 
   administrator/script mistake) when some tasks of a container are in one 
   resource class while others are in a different class.

Instead, from a containers pov, it may be usefull to write
a 'container id' (if such a thing exists) into the tasks file
which will move all the tasks of the container into 
the new resource class. This is the same requirement we
discussed long back of moving all threads of a process into new 
resource class.


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Wed, Mar 07, 2007 at 01:20:18PM -0800, Paul Menage wrote:
 On 3/7/07, Serge E. Hallyn [EMAIL PROTECTED] wrote:
 
 All that being said, if it were going to save space without overly
 complicating things I'm actually not opposed to using nsproxy, but it
 
 If space-saving is the main issue, then the latest version of my
 containers patches uses just a single pointer in the task_struct, and
 all tasks in the same set of containers (across all hierarchies) will
 share a single container_group object, which holds the actual pointers
 to container state.

Paul,
Some more thoughts, mostly coming from the point of view of
vservers/containers/whaever is the set of tasks sharing a nsproxy is
called.

1. What is the fundamental unit over which resource-management is
applied? Individual tasks or individual containers?

/me thinks latter. In which case, it makes sense to stick 
resource control information in the container somewhere.
Just like when controlling a user's resource consumption, 
'struct user_struct' may be a natural place to put these resource 
limits.

2. Regarding space savings, if 100 tasks are in a container (I dont know
   what is a typical number) -and- lets say that all tasks are to share
   the same resource allocation (which seems to be natural), then having
   a 'struct container_group *' pointer in each task_struct seems to be not 
   very efficient (simply because we dont need that task-level granularity of
   managing resource allocation).

3. This next leads me to think that 'tasks' file in each directory doesnt make 
   sense for containers. In fact it can lend itself to error situations (by 
   administrator/script mistake) when some tasks of a container are in one 
   resource class while others are in a different class.

Instead, from a containers pov, it may be usefull to write
a 'container id' (if such a thing exists) into the tasks file
which will move all the tasks of the container into 
the new resource class. This is the same requirement we
discussed long back of moving all threads of a process into new 
resource class.


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 10:04:30PM +0530, Srivatsa Vaddagiri wrote:
 2. Regarding space savings, if 100 tasks are in a container (I dont know
what is a typical number) -and- lets say that all tasks are to share
the same resource allocation (which seems to be natural), then having
a 'struct container_group *' pointer in each task_struct seems to be not 
very efficient (simply because we dont need that task-level granularity of
managing resource allocation).

Note that this 'struct container_group *' pointer is in addition to the
'struct nsproxy *' pointer already in task_struct. If the set of tasks
over which resorce control is applied is typically the same set of tasks
which share the same 'struct nsproxy *' pointer, then IMHO 'struct
container_group *' in each task_struct is not very optimal.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Serge E. Hallyn
Quoting Paul Menage ([EMAIL PROTECTED]):
 On 3/7/07, Sam Vilain [EMAIL PROTECTED] wrote:
 
 Ok, they share this characteristic with namespaces: that they group
 processes.

Namespaces have a side effect of grouping processes, but a namespace is
not defined by 'grouping proceses.'  A container is, in fact, a group of
processes.

  So, they conceptually hang off task_struct.  But we put them
 on ns_proxy because we've got this vague notion that things might be
 better that way.
 
 Remember that I'm not the one pushing to move them into ns_proxy.
 These patches are all Srivatsa's work. Despite that fact that they say
 Signed-off-by: Paul Menage, I'd never seen them before they were
 posted to LKML, and I'm not sure that they're the right approach.
 (Although some form of unification might be good).

The nsproxy container subsystem could be said to be that unification.
If we really wanted to I suppose we could now always mount the nsproxy
subsystem, get rid of tsk-nsproxy, and always get thta through it's
nsproxy subsystem container.  But then that causes trouble with being
able to mount a hierarachy like

mount -t container -o ns,cpuset

so we'd have to fix something.  It also slows things down...

  about this you still insist on calling this sub-system specific stuff
  the container,
 
  Uh, no. I'm trying to call a *grouping* of processes a container.
 
 
 Ok, so is this going to supplant the namespaces too?
 
 I don't know. It would be nice to have a single object hanging off the
 task struct that contains all the various grouping pointers. Having

The namespaces aren't grouping pointers, they are resource id tables.

I stand by my earlier observation that placing namespace pointers and
grouping pointers in the same structure means that pointer will end up
pointing to itself.

 something that was flexible enough to handle all the required
 behaviours, or else allowing completely different behaviours for
 different subsets of that structure, could be the fiddly bit.
 
 See my expanded reply to Eric' earlier post for a possible way of
 unifying them, and simplifying the nsproxy and container.c code in the
 process.

Doesn't ring a bell, I'll have to look around for that...

 
   - resource groups (I get a strange feeling of d?j? v? there)
 
 Resource groups isn't a terrible name for them (although I'd be

I still like 'rug' for resource usage groups :)

-serge
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 01:53:57AM +0100, Herbert Poetzl wrote:
  The real trick is that I believe these groupings are designed to
  be something you can setup on login and then not be able to switch
  out of. Which means we can't use sessions and process groups as the
  grouping entities as those have different semantics.
 
 precisely, once you are inside a resource container, you
 must not have the ability to modify its limits, and to
 some degree, you should not know about the actual available
 resources, but only about the artificial limits

From non-container workload management perspective, we do desire dynamic
manipulation of limits associated with a group and also the ability to move 
tasks across resource-classes/groups.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Fri, Mar 09, 2007 at 02:16:08AM +0100, Herbert Poetzl wrote:
 On Thu, Mar 08, 2007 at 05:00:54PM +0530, Srivatsa Vaddagiri wrote:
  On Thu, Mar 08, 2007 at 01:50:01PM +1300, Sam Vilain wrote:
   7. resource namespaces
  
  It should be. Imagine giving 20% bandwidth to a user X. X wants to
  divide this bandwidth further between multi-media (10%), kernel
  compilation (5%) and rest (5%). So,
 
 sounds quite nice, but ...
 
   Is the subservient namespace's resource usage counting against ours too?
  
  Yes, the resource usage of children should be accounted when capping
  parent resource usage.
 
 it will require to do accounting many times
 (and limit checks of course), which in itself
 might be a way to DoS the kernel by creating
 more and more resource groups

I was only pointing out the usefullness of the feature and not
necessarily saying it -should- be implemented! Ofcourse I understand it
will make the controller complicated and thats why probably none of the
recontrollers we are seeing posted on lkml don't support hierarchical
res mgmt.

   Can we dynamically alter the subservient namespace's resource
   allocations?
  
  Should be possible yes. That lets user X completely manage his
  allocation among whatever sub-groups he creates.
 
 what happens if the parent changes, how is
 the resource change (if it was a reduction)
 propagated to the children?

 e.g. your guest has 1024 file handles, now
 you reduce it to 512, but the guest had two
 children, both with 256 file handles each ...

I believe CKRM handled this quite neatly (by defining child shares to be
relative to parent shares). 

In your example, 256+256 add up to 512 which is within the parent's new limit, 
so nothing happens :) You also picked an example of exhaustible/non-reclaimable 
resource, which makes it hard to define what should happen if parent's limit 
goes below 512. Either nothing happens or perhaps a task is killed,
don't know. In case of memory, I would say that some of child's pages may 
get kicked out and in case of cpu, child will start getting fewer
cycles.

  The patches should give visibility to both nsproxy objects (by showing
  what tasks share the same nsproxy objects and letting tasks move across
  nsproxy objects if allowed) and the resource control objects pointed to
  by nsproxy (struct cpuset, struct cpu_limit, struct rss_limit etc).
 
 the nsproxy is not really relevant, as it
 is some kind of strange indirection, which
 does not necessarily depict the real relations,
 regardless wether you do the re-sharing of
 those nsproies or not .. 

So what are you recommending we do instead? My thought was whatever is
the fundamental unit to which resource management needs to be applied,
lets store resource parameters (or pointers to them) there (rather than 
duplicating the information in each task_struct).

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Paul Jackson
Herbert wrote (and vatsa quoted):
 precisely, once you are inside a resource container, you
 must not have the ability to modify its limits, and to
 some degree, you should not know about the actual available
 resources, but only about the artificial limits

Not necessarily.  Depending on the resource we are managing, and on how
all encompassing one chooses to make the virtualization, this might be
an overly Draconian permission model.

Certainly in cpusets, any task can see the the whole system view, if
there are fairly typical permissions on some /dev/cpuset files.  A task
might even be able to change those limits for any task on the system,
if it has stronger priviledge.

Whether or not to really virtualize something, so that the contained
task can't tell it is in side a cacoon, is a somewhat separate question
from whether or not to limit a tasks use of a particular precious
resource.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Herbert Poetzl
On Fri, Mar 09, 2007 at 11:49:08PM +0530, Srivatsa Vaddagiri wrote:
 On Fri, Mar 09, 2007 at 01:53:57AM +0100, Herbert Poetzl wrote:
 The real trick is that I believe these groupings are designed to
 be something you can setup on login and then not be able to switch
 out of. Which means we can't use sessions and process groups as the
 grouping entities as those have different semantics.
 
 precisely, once you are inside a resource container, you
 must not have the ability to modify its limits, and to
 some degree, you should not know about the actual available
 resources, but only about the artificial limits

the emphasis here is on 'from inside' which basically
boils down to the following:

 if you create a 'resource container' to limit the
 usage of a set of resources for the processes
 belonging to this container, it would be kind of
 defeating the purpose, if you'd allow the processes
 to manipulate their limits, no?

 From non-container workload management perspective, we do desire
 dynamic manipulation of limits associated with a group and also the
 ability to move tasks across resource-classes/groups.

the above doesn't mean that there aren't processes
_outside_ the resource container which have the
necessary capabilities to manipulate the container
in any way (changing limits dynamically, moving
tasks in and out of the container, etc ...)

best,
Herbert

 -- 
 Regards,
 vatsa
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Paul Jackson
 the emphasis here is on 'from inside' which basically
 boils down to the following:
 
  if you create a 'resource container' to limit the
  usage of a set of resources for the processes
  belonging to this container, it would be kind of
  defeating the purpose, if you'd allow the processes
  to manipulate their limits, no?

Wrong - this is not the only way.

For instance in cpusets, -any- task in the system, regardless of what
cpuset it is currently assigned to, might be able to manipulate -any-
cpuset in the system.

Yes -- some sufficient mechanism is required to keep tasks from
escalating their resources or capabilities beyond an allowed point.

But that mechanism might not be strictly based on position in some
hierarchy.

In the case of cpusets, it is based on the permissions on files in
the cpuset file system (normally mounted at /dev/cpuset), versus
the current priviledges and capabilities of the task.

A root priviledged task in the smallest leaf node cpuset can manipulate
every cpuset in the system.  This is an ordinary and common occurrence.

I say again, as you seem to be skipping over this detail, one
advantage of basing an API on a file system is the usefulness of
the file system permission model (the -rwxrwxrwx permissions and
the uid/gid owners on each file and directory node).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Paul Menage

On 3/9/07, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:


1. What is the fundamental unit over which resource-management is
applied? Individual tasks or individual containers?

/me thinks latter.


Yes


In which case, it makes sense to stick
resource control information in the container somewhere.


Yes, that's what all my patches have been doing.


2. Regarding space savings, if 100 tasks are in a container (I dont know
   what is a typical number) -and- lets say that all tasks are to share
   the same resource allocation (which seems to be natural), then having
   a 'struct container_group *' pointer in each task_struct seems to be not
   very efficient (simply because we dont need that task-level granularity of
   managing resource allocation).


I think you should re-read my patches.

Previously, each task had N pointers, one for its container in each
potential hierarchy. The container_group concept means that each task
has 1 pointer, to a set of container pointers (one per hierarchy)
shared by all tasks that have exactly the same set of containers (in
the various different hierarchies).

It doesn't give task-level granularity of resource management (unless
you create a separate container for each task), it just gives a space
saving.



3. This next leads me to think that 'tasks' file in each directory doesnt make
   sense for containers. In fact it can lend itself to error situations (by
   administrator/script mistake) when some tasks of a container are in one
   resource class while others are in a different class.

Instead, from a containers pov, it may be usefull to write
a 'container id' (if such a thing exists) into the tasks file
which will move all the tasks of the container into
the new resource class. This is the same requirement we
discussed long back of moving all threads of a process into new
resource class.


I think you need to give a more concrete example and use case of what
you're trying to propose here. I don't really see what advantage
you're getting.

Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
I think maybe I didnt communicate what I mean by a container here
(although I thought I did). I am referring to a container in a vserver
context (set of tasks which share the same namespace).

On Fri, Mar 09, 2007 at 02:09:35PM -0800, Paul Menage wrote:
 2. Regarding space savings, if 100 tasks are in a container (I dont know
what is a typical number) -and- lets say that all tasks are to share
the same resource allocation (which seems to be natural), then having
a 'struct container_group *' pointer in each task_struct seems to be not
very efficient (simply because we dont need that task-level granularity 
of
managing resource allocation).
 
 I think you should re-read my patches.
 
 Previously, each task had N pointers, one for its container in each
 potential hierarchy. The container_group concept means that each task
 has 1 pointer, to a set of container pointers (one per hierarchy)
 shared by all tasks that have exactly the same set of containers (in
 the various different hierarchies).

Ok, let me see if I can convey what I had in mind better:

uts_ns pid_ns ipc_ns
\|/
---
   | nsproxy|

 /  |   \\ -- 'nsproxy' pointer
T1  T2  T3 ...T1000
|   |   |  | -- 'containers' pointer (4/8 KB for 1000 task)
   ---
  | container_group   |
   --   
/
 --
| container |
 --
|
 --
| cpu_limit |
 -- 

(T1, T2, T3 ..T1000) are part of a vserver lets say sharing the same
uts/pid/ipc_ns. Now where do we store the resource control information
for this unit/set-of-tasks in your patches?

(tsk-containers-container[cpu_ctlr.hierarchy] + X)-cpu_limit 

(The X is to account for the fact that cotainer structure points to a
'struct container_subsys_state' embedded in some other structure. Its
usually zero if the structure is embedded at the top)

I understand that container_group also points directly to
'struct container_subsys_state', in which case, the above is optimized
to:

(tsk-containers-subsys[cpu_ctlr.subsys_id] + X)-cpu_limit

Did I get that correct?

Compare that to:

   ---
  | cpu_limit |
uts_ns pid_ns ipc_ns   --
\|/ |

   |nsproxy |

 /  |   \|
T1  T2  T3 .T1000

We save on 4/8 KB (for 1000 tasks) by avoiding the 'containers' pointer
in each task_struct (just to get to the resource limit information).

So my observation was (again note primarily from a vserver context): given that 
(T1, T2, T3 ..T1000) will all need to be managed as a unit (because they are 
all sharing the same nsproxy pointer), then having the '-containers' pointer 
in -each- one of them to tell the unit's limit is not optimal. Instead store 
the limit in the proper unit structure (in this case nsproxy - but
whatever else is more suitable vserver datastructure (pid_ns?) which
represent the fundamental unit of res mgmt in vservers).

(I will respond to remaining comments later ..too early in the morning now!)

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Herbert Poetzl
On Sat, Mar 10, 2007 at 12:11:05AM +0530, Srivatsa Vaddagiri wrote:
 On Fri, Mar 09, 2007 at 02:16:08AM +0100, Herbert Poetzl wrote:
  On Thu, Mar 08, 2007 at 05:00:54PM +0530, Srivatsa Vaddagiri wrote:
   On Thu, Mar 08, 2007 at 01:50:01PM +1300, Sam Vilain wrote:
7. resource namespaces
   
   It should be. Imagine giving 20% bandwidth to a user X. X wants to
   divide this bandwidth further between multi-media (10%), kernel
   compilation (5%) and rest (5%). So,
  
  sounds quite nice, but ...
  
Is the subservient namespace's resource usage counting against
ours too?
   
   Yes, the resource usage of children should be accounted when capping
   parent resource usage.
  
  it will require to do accounting many times
  (and limit checks of course), which in itself
  might be a way to DoS the kernel by creating
  more and more resource groups
 
 I was only pointing out the usefullness of the feature and not
 necessarily saying it -should- be implemented! Ofcourse I understand it
 will make the controller complicated and thats why probably none of the
 recontrollers we are seeing posted on lkml don't support hierarchical
 res mgmt.
 
Can we dynamically alter the subservient namespace's resource
allocations?
   
   Should be possible yes. That lets user X completely manage his
   allocation among whatever sub-groups he creates.
  
  what happens if the parent changes, how is
  the resource change (if it was a reduction)
  propagated to the children?
 
  e.g. your guest has 1024 file handles, now
  you reduce it to 512, but the guest had two
  children, both with 256 file handles each ...
 
 I believe CKRM handled this quite neatly (by defining child shares to be
 relative to parent shares). 
 
 In your example, 256+256 add up to 512 which is within the parent's
 new limit, so nothing happens :) 

yes, but that might as well be fatal, because now the
children can easily DoS the parent by using up all the
file handles, where the 'original' setup (2 x 256)
left 512 file handles 'reserved' ...

of course, you could as well have adjusted that to
2 x 128 + 256 for the parent, but that is policy and
IMHO policy does not belong into the kernel, it should
be handled by userspace (maybe invoked by the kernel
in some kind of helper functionality or so)

 You also picked an example of exhaustible/non-reclaimable resource,
 which makes it hard to define what should happen if parent's limit
 goes below 512.

which was quite intentional, and brings us to another
issues when adjusting resource limits (not even in
a hierarchical way)

 Either nothing happens or perhaps a task is killed, don't know.

 In case of memory, I would say that some of child's pages may 
 get kicked out and in case of cpu, child will start getting fewer
 cycles.

btw, kicking out pages when rss limit is reached might
be the obvious choice (if we think Virtual Machine here)
but it might not be the best choice from the overall
performance PoV, which might be much better off by
keeping the page in memory (if there is enough memory
available) but penalizing the guest like the page was
actually kicked out (and needs to be fetched later on)

note: this is something we should think about when we
want to address specific limits like RSS, because IMHO
we should not optimize for the single guest case, but
for the big picture ...

   The patches should give visibility to both nsproxy objects (by
   showing what tasks share the same nsproxy objects and letting
   tasks move across nsproxy objects if allowed) and the resource
   control objects pointed to by nsproxy (struct cpuset, struct
   cpu_limit, struct rss_limit etc).
  
  the nsproxy is not really relevant, as it
  is some kind of strange indirection, which
  does not necessarily depict the real relations,
  regardless wether you do the re-sharing of
  those nsproies or not .. 
 
 So what are you recommending we do instead? 
 My thought was whatever is the fundamental unit to which resource
 management needs to be applied, lets store resource parameters (or
 pointers to them) there (rather than duplicating the information in
 each task_struct).

we do not want to duplicate any information in the task
struct, but we might want to put some (or maybe all)
of the spaces back (as pointer reference) to the task
struct, just to avoid the nsproxy indirection

note that IMHO not all spaces make sense to be separated
e.g. while it is quite useful to have network and pid
space separated, others might be joined to form larger
consistant structures ...

for example, I could as well live with pid and resource
accounting/limits sharing one common struct/space ...
(doesn't mean that separate spaces are not nice :)

best,
Herbert

 -- 
 Regards,
 vatsa
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL 

Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-09 Thread Srivatsa Vaddagiri
On Sat, Mar 10, 2007 at 07:32:20AM +0530, Srivatsa Vaddagiri wrote:
 Ok, let me see if I can convey what I had in mind better:
 
   uts_ns pid_ns ipc_ns
   \|/
   ---
  | nsproxy|
   
  /  |   \\ -- 'nsproxy' pointer
   T1  T2  T3 ...T1000
   |   |   |  | -- 'containers' pointer (4/8 KB for 1000 task)
  ---
 | container_group   |
  --   
   /
--
   | container |
--
   |
--
   | cpu_limit |
-- 

[snip]

 We save on 4/8 KB (for 1000 tasks) by avoiding the 'containers' pointer
 in each task_struct (just to get to the resource limit information).

Having the 'containers' pointer in each task-struct is great from a
non-container res mgmt perspective. It lets you dynamically decide what
is the fundamental unit of res mgmt. 

It could be {T1, T5} tasks/threads of a process, or {T1, T3, T8, T10} tasks of 
a session (for limiting login time per session), or {T1, T2 ..T10, T18, T27} 
tasks of a user etc.

But from a vserver/container pov, this level flexibility (at a -task- level) of 
deciding the unit of res mgmt is IMHO not needed. The
vserver/container/namespace (tsk-nsproxy-some_ns) to which a task 
belongs automatically defines that unit of res mgmt.


-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
Matt wrote:
> It's like that Star Trek episode ... except we can't agree on the name

Usually, when there is this much heat and smoke over a name, there is
really an underlying disagreement or misunderstanding over the meaning
of something.

The name becomes the proxy for meaning ;).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
> The real trick is that I believe these groupings are designed to be something
> you can setup on login and then not be able to switch out of.  Which means
> we can't use sessions and process groups as the grouping entities as those 
> have different semantics.

Not always on login.  For big administered systems, we use batch schedulers
to manage the placement of multiple jobs, submitted to a run queue by users,
onto the available compute resources.

But I agree with your conclusion - the existing task grouping mechanisms,
while useful for some purposes, don't meet the need here.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
> But "namespace" has well-established historical semantics too - a way
> of changing the mappings of local * to global objects. This
> accurately describes things liek resource controllers, cpusets, resource
> monitoring, etc.

No!

Cpusets don't rename or change the mapping of objects.

I suspect you seriously misunderstand cpusets and are trying to cram them
into a 'namespace' remapping role into which they don't fit.

So far as cpusets are concerned, CPU #17 is CPU #17, for all tasks,
regardless of what cpuset they are in.  They just might not happen to
be allowed to execute on CPU #17 at the moment, because that CPU is not
allowed by the cpuset they are in.

But they still call it CPU #17.

Similary the namespace of cpusets and of tasks (pid's) are single
system-wide namespaces, so far as cpusets are concerned.

Cpusets are not about alternative or multiple or variant name spaces.
They are about (considering just CPUs for the moment):
 1) creating a set of maps M0, M1, ... from the set of CPUs to a Boolean,
 2) creating a mapping Q from the set of tasks to these M0, ... maps, and
 3) imposing constraints on where tasks can run, as follows:
For any task t, that task is allowed to run on CPU x iff Q(t)(x)
is True.  Here, Q(t) will be one of the maps M0, ... aka a cpuset.

So far as cpusets are concerned, there is only one each of:
 A] a namespace numbering CPUs,
 B] a namespace numbering tasks (the process id),
 C] a namespace naming cpusets (the hierarchical name space normally
mounted at /dev/cpuset, and corresponding to the Mn maps above) and
 D] a mapping of tasks to cpusets, system wide (just a map, not a namespace.)

All tasks (of sufficient authority) can see each of these, using a single
system wide name space for each of [A], [B], and [C].

Unless, that is, you call any mapping a "way of changing mappings".
To do so would be a senseless abuse of the phrase, in my view.

More generally, these resource managers all tend to divide some external
limited physical resource into multiple separately allocatable units.

If the resource is amorphous (one atom or cycle of it is interchangeable
with another) then we usually do something like divide it into 100
equal units and speak of percentages.  If the resource is naturally
subdivided into sufficiently small units (sufficient for the
granularity of resource management we require) then we take those units
as is.  Occassionally, as in the 'fake numa node' patch by David
Rientjes <[EMAIL PROTECTED]>, who worked at Google over the
last summer, if the natural units are not of sufficient granularity, we
fake up a somewhat finer division.

Then, in any case, and somewhat separately, we divide the tasks running
on the system into subsets.  More precisely, we partition the tasks,
where a partition of a set is a set of subsets of that set, pairwise
disjoint, whose union equals that set.

Then, finally, we map the task subsets (partition element) to the
resource units, and add hooks in the kernel where this particular
resource is allocated or scheduled to constrain the tasks to only using
the units to which their task partition element is mapped.

These hooks are usually the 'interesting' part of a resource management
patch; one needs to minimize impact on both the kernel source code and
on the runtime performance, and for these hooks, that can be a
challenge.  In particular, what are naturally system wide resource
management stuctures cannot be allowed to impose system wide locks on
critical resource allocation code paths (and it's usually the most
critical resources, such as memory, cpu and network, that we most need
to manage in the first place.)

==> This has nothing to do with remapping namespaces as I might use that
phrase though I cannot claim to be qualified enough to speak on behalf
of the Generally Established Principles of Computer Science.

I am as qualified as anyone to speak on behalf of cpusets, and I suspect
you are not accurately understanding them if you think of them as remapping
namespaces.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl
On Thu, Mar 08, 2007 at 05:00:54PM +0530, Srivatsa Vaddagiri wrote:
> On Thu, Mar 08, 2007 at 01:50:01PM +1300, Sam Vilain wrote:
> > 7. resource namespaces
> 
> It should be. Imagine giving 20% bandwidth to a user X. X wants to
> divide this bandwidth further between multi-media (10%), kernel
> compilation (5%) and rest (5%). So,

sounds quite nice, but ...

> > Is the subservient namespace's resource usage counting against ours too?
> 
> Yes, the resource usage of children should be accounted when capping
> parent resource usage.

it will require to do accounting many times
(and limit checks of course), which in itself
might be a way to DoS the kernel by creating
more and more resource groups
> 
> > Can we dynamically alter the subservient namespace's resource
> > allocations?
> 
> Should be possible yes. That lets user X completely manage his
> allocation among whatever sub-groups he creates.

what happens if the parent changes, how is
the resource change (if it was a reduction)
propagated to the children?

e.g. your guest has 1024 file handles, now
you reduce it to 512, but the guest had two
children, both with 256 file handles each ...

> > So let's bring this back to your patches. If they are providing
> > visibility of ns_proxy, then it should be called namesfs or some
> > such.
> 
> The patches should give visibility to both nsproxy objects (by showing
> what tasks share the same nsproxy objects and letting tasks move across
> nsproxy objects if allowed) and the resource control objects pointed to
> by nsproxy (struct cpuset, struct cpu_limit, struct rss_limit etc).

the nsproxy is not really relevant, as it
is some kind of strange indirection, which
does not necessarily depict the real relations,
regardless wether you do the re-sharing of
those nsproies or not .. let me know if you
need examples to verify that ...

best,
Herbert

> > It doesn't really matter if processes disappear from namespace
> > aggregates, because that's what's really happening anyway. The only
> > problem is that if you try to freeze a namespace that has visibility
> > of things at this level, you might not be able to reconstruct the
> > filesystem in the same way. This may or may not be considered a
> > problem, but open filehandles and directory handles etc surviving
> > a freeze/thaw is part of what we're trying to achieve. Then again,
> > perhaps some visibility is better than none for the time being.
> > 
> > If they are restricted entirely to resource control, then don't use
> > the nsproxy directly - use the structure or structures which hang
> > off the nsproxy (or even task_struct) related to resource control.
> 
> -- 
> Regards,
> vatsa
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl
On Wed, Mar 07, 2007 at 11:44:58PM -0700, Eric W. Biederman wrote:
> Matt Helsley <[EMAIL PROTECTED]> writes:
> 
> > On Thu, 2007-03-08 at 16:32 +-1300, Sam Vilain wrote:
> >
> > +ADw-snip+AD4
> >
> > +AD4 Kirill, 06032418:36+-03:
> > +AD4 +AD4 I propose to use +ACI-namespace+ACI naming.
> > +AD4 +AD4 1. This is already used in fs.
> > +AD4 +AD4 2. This is what IMHO suites at least OpenVZ/Eric
> > +AD4 +AD4 3. it has good acronym +ACI-ns+ACI.
> > +AD4 
> > +AD4 Right.  So, now I'll also throw into the mix:
> > +AD4 
> > +AD4   - resource groups (I get a strange feeling of d+AOk-j+AOA v+APo 
> > there)
> >
> > +ADw-offtopic+AD4
> > Re: d+AOk-j+AOA v+APo: yes+ACE
> >
> > It's like that Star Trek episode ... except we can't agree on the name
> > of the impossible particle we will invent which solves all our problems.
> > +ADw-/offtopic+AD4
> >
> > At the risk of prolonging the agony I hate to ask: are all of these
> > groupings really concerned with +ACI-resources+ACI?
> >
> > +AD4   - supply chains (think supply and demand)
> > +AD4   - accounting classes
> >
> > CKRM's use of the term +ACI-class+ACI drew negative comments from Paul 
> > Jackson
> > and Andrew Morton about this time last year. That led to my suggestion
> > of +ACI-Resource Groups+ACI. Unless they've changed their minds...
> >
> > +AD4 Do any of those sound remotely close?  If not, your turn :)
> >
> > I'll butt in here: task groups? task sets? confuselets? +ADs)
> 
> Generically we can use subsystem now for the individual pieces without
> confusing anyone.
> 
> I really don't much care as long as we don't start redefining
> container as something else.  I think the IBM guys took it from
> solaris originally which seems to define a zone as a set of
> isolated processes (for us all separate namespaces).  And a container
> as a set of as a zone that uses resource control.  Not exactly how
> we have been using the term but close enough not to confuse someone.
> 
> As long as we don't go calling the individual subsystems or the
> process groups they need to function a container I really don't care.
> 
> I just know that if we use container for just the subsystem level
> it makes effective communication impossible, and code reviews
> essentially impossible.  As the description says one thing the
> reviewer reads it as another and then the patch does not match
> the description.  Leading to NAKs.
> 
> Resource groups at least for subset of subsystems that aren't
> namespaces sounds reasonable.  Heck resource group, resource
> controller, resource subsystem, resource just about anything seems
> sane to me.
> 
> The important part is that we find a vocabulary without doubly
> defined words so we can communicate and a small common set we can
> agree on so people can work on and implement the individual
> resource controllers/groups, and get the individual pieces merged
> as they are reading.

from my personal PoV the following would be fine:

 spaces (for the various 'spaces')

  - similar enough to the old namespace
  - can be easily used with prefix/postfix
like in pid_space, mnt_space, uts_space etc
  - AFAIK, it is not used yet for anything else

 container (for resource accounting/limits)

  - has the 'containment' principle built in :)
  - is used in similar ways in other solutions
  - sounds similar to context (easy to associate)

note: I'm also fine with other names, as long as
we find some useable vocabulary soon, as the
different terms start confusing me on a regular
basis, and we do not go for already used names,
which would clash with Linux-VServer or OpenVZ
terminology (which would confuse the hell out of
the end-users :)

best,
Herbert

> Eric
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl
On Wed, Mar 07, 2007 at 05:35:58PM -0800, Paul Menage wrote:
> On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> Pretty much. For most of the other cases I think we are safe
>> referring to them as resource controls or resource limits. I know
>> that roughly covers what cpusets and beancounters and ckrm currently
>> do.
> 
> Plus resource monitoring (which may often be a subset of resource
> control/limits).

we (Linux-VServer) call that resource accounting, and
it is the first step to resource limits ...

> > The real trick is that I believe these groupings are designed to be
> > something you can setup on login and then not be able to switch out
> > of.
> 
> That's going to to be the case for most resource controllers - is that
> the case for namespaces? (e.g. can any task unshare say its mount
> namespace?)

ATM, yes, and there is no real harm in doing so
this would be a problem for resource containers,
unless they are strict hierarchical, i.e. only
allow to further restrict the existing resources
(which might cause some trouble if existing limits
have to be changed at some point)

best,
Herbert

> Paul
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl
On Wed, Mar 07, 2007 at 06:32:10PM -0700, Eric W. Biederman wrote:
> "Paul Menage" <[EMAIL PROTECTED]> writes:
> 
>> On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:
>>> But "namespace" has well-established historical semantics too - a way
>>> of changing the mappings of local * to global objects. This
>>> accurately describes things liek resource controllers, cpusets, resource
>>> monitoring, etc.
>>
>> Sorry, I think this statement is wrong, by the generally established
>> meaning of the term namespace in computer science.
>>
>>> Trying to extend the well-known term namespace to refer to things
>>> that are semantically equivalent namespaces is a useful approach,
>>> IMHO.
>>
>> Yes, that would be true. But the kinds of groupings that we're talking
>> about are supersets of namespaces, not semantically equivalent to
>> them. To use Eric's "shoe" analogy from earlier, it's like insisting
>> that we use the term "sneaker" to refer to all footware, including ski
>> boots and birkenstocks ...
> 
> Pretty much.  For most of the other cases I think we are safe referring
> to them as resource controls or resource limits.  

> I know that roughly covers what cpusets and beancounters and ckrm
> currently do.

let me tell you, it also covers what Linux-VServer does :)

> The real trick is that I believe these groupings are designed to
> be something you can setup on login and then not be able to switch
> out of. Which means we can't use sessions and process groups as the
> grouping entities as those have different semantics.

precisely, once you are inside a resource container, you
must not have the ability to modify its limits, and to
some degree, you should not know about the actual available
resources, but only about the artificial limits

HTC,
Herbert

> Eric
> ___
> Containers mailing list
> [EMAIL PROTECTED]
> https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Srivatsa Vaddagiri
On Wed, Mar 07, 2007 at 04:16:00PM -0700, Eric W. Biederman wrote:
> I think implementation wise this tends to make sense.
> However it should have nothing to do with semantics.
> 
> If we have a lot of independent resource controllers.  Placing the
> pointer to their data structures directly in nsproxy instead of in
> task_struct sounds like a reasonable idea 

Thats what the rcfs patches do.

> but it should not be user visible.

What do you mean by this? We do want the user to be able to manipulate
the resource parameters (which are normally present in the data
structures/resource objects pointed to by nsproxy -
nsproxy->ctlr_data[])

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Srivatsa Vaddagiri
On Thu, Mar 08, 2007 at 01:50:01PM +1300, Sam Vilain wrote:
> 7. resource namespaces

It should be. Imagine giving 20% bandwidth to a user X. X wants to
divide this bandwidth further between multi-media (10%), kernel
compilation (5%) and rest (5%). So,

> Is the subservient namespace's resource usage counting against ours too?

Yes, the resource usage of children should be accounted when capping
parent resource usage.

> Can we dynamically alter the subservient namespace's resource allocations?

Should be possible yes. That lets user X completely manage his
allocation among whatever sub-groups he creates.

> So let's bring this back to your patches. If they are providing
> visibility of ns_proxy, then it should be called namesfs or some such.

The patches should give visibility to both nsproxy objects (by showing
what tasks share the same nsproxy objects and letting tasks move across
nsproxy objects if allowed) and the resource control objects pointed to
by nsproxy (struct cpuset, struct cpu_limit, struct rss_limit etc).

> It doesn't really matter if processes disappear from namespace
> aggregates, because that's what's really happening anyway. The only
> problem is that if you try to freeze a namespace that has visibility of
> things at this level, you might not be able to reconstruct the
> filesystem in the same way. This may or may not be considered a problem,
> but open filehandles and directory handles etc surviving a freeze/thaw
> is part of what we're trying to achieve. Then again, perhaps some
> visibility is better than none for the time being.
> 
> If they are restricted entirely to resource control, then don't use the
> nsproxy directly - use the structure or structures which hang off the
> nsproxy (or even task_struct) related to resource control.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Menage

On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:


Ok, they share this characteristic with namespaces: that they group
processes.  So, they conceptually hang off task_struct.  But we put them
on ns_proxy because we've got this vague notion that things might be
better that way.


Remember that I'm not the one pushing to move them into ns_proxy.
These patches are all Srivatsa's work. Despite that fact that they say
"Signed-off-by: Paul Menage", I'd never seen them before they were
posted to LKML, and I'm not sure that they're the right approach.
(Although some form of unification might be good).



>> about this you still insist on calling this sub-system specific stuff
>> the "container",
>>
> Uh, no. I'm trying to call a *grouping* of processes a container.
>

Ok, so is this going to supplant the namespaces too?


I don't know. It would be nice to have a single object hanging off the
task struct that contains all the various grouping pointers. Having
something that was flexible enough to handle all the required
behaviours, or else allowing completely different behaviours for
different subsets of that structure, could be the fiddly bit.

See my expanded reply to Eric' earlier post for a possible way of
unifying them, and simplifying the nsproxy and container.c code in the
process.



  - resource groups (I get a strange feeling of déjà vú there)


Resource groups isn't a terrible name for them (although I'd be
wondering whether the BeanCounters folks would object :-) ) but the
intention is that they're more generic than purely for resource
accounting. (E.g. see my other email where I suggested that things
like task->mempolicy and task->user could potentially be treated in
the same way)

Task Group is a good name, except for the fact that it's too easily
confused with process group.



And do we bother changing IPC namespaces or let that one slide?



I think that "namespace" is a fine term for the IPC id
virtualization/restriction that ipc_ns provides. (Unless I'm totally
misunderstanding the concept).

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Srivatsa Vaddagiri
On Thu, Mar 08, 2007 at 01:50:01PM +1300, Sam Vilain wrote:
 7. resource namespaces

It should be. Imagine giving 20% bandwidth to a user X. X wants to
divide this bandwidth further between multi-media (10%), kernel
compilation (5%) and rest (5%). So,

 Is the subservient namespace's resource usage counting against ours too?

Yes, the resource usage of children should be accounted when capping
parent resource usage.

 Can we dynamically alter the subservient namespace's resource allocations?

Should be possible yes. That lets user X completely manage his
allocation among whatever sub-groups he creates.

 So let's bring this back to your patches. If they are providing
 visibility of ns_proxy, then it should be called namesfs or some such.

The patches should give visibility to both nsproxy objects (by showing
what tasks share the same nsproxy objects and letting tasks move across
nsproxy objects if allowed) and the resource control objects pointed to
by nsproxy (struct cpuset, struct cpu_limit, struct rss_limit etc).

 It doesn't really matter if processes disappear from namespace
 aggregates, because that's what's really happening anyway. The only
 problem is that if you try to freeze a namespace that has visibility of
 things at this level, you might not be able to reconstruct the
 filesystem in the same way. This may or may not be considered a problem,
 but open filehandles and directory handles etc surviving a freeze/thaw
 is part of what we're trying to achieve. Then again, perhaps some
 visibility is better than none for the time being.
 
 If they are restricted entirely to resource control, then don't use the
 nsproxy directly - use the structure or structures which hang off the
 nsproxy (or even task_struct) related to resource control.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Srivatsa Vaddagiri
On Wed, Mar 07, 2007 at 04:16:00PM -0700, Eric W. Biederman wrote:
 I think implementation wise this tends to make sense.
 However it should have nothing to do with semantics.
 
 If we have a lot of independent resource controllers.  Placing the
 pointer to their data structures directly in nsproxy instead of in
 task_struct sounds like a reasonable idea 

Thats what the rcfs patches do.

 but it should not be user visible.

What do you mean by this? We do want the user to be able to manipulate
the resource parameters (which are normally present in the data
structures/resource objects pointed to by nsproxy -
nsproxy-ctlr_data[])

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl
On Wed, Mar 07, 2007 at 06:32:10PM -0700, Eric W. Biederman wrote:
 Paul Menage [EMAIL PROTECTED] writes:
 
 On 3/7/07, Sam Vilain [EMAIL PROTECTED] wrote:
 But namespace has well-established historical semantics too - a way
 of changing the mappings of local * to global objects. This
 accurately describes things liek resource controllers, cpusets, resource
 monitoring, etc.

 Sorry, I think this statement is wrong, by the generally established
 meaning of the term namespace in computer science.

 Trying to extend the well-known term namespace to refer to things
 that are semantically equivalent namespaces is a useful approach,
 IMHO.

 Yes, that would be true. But the kinds of groupings that we're talking
 about are supersets of namespaces, not semantically equivalent to
 them. To use Eric's shoe analogy from earlier, it's like insisting
 that we use the term sneaker to refer to all footware, including ski
 boots and birkenstocks ...
 
 Pretty much.  For most of the other cases I think we are safe referring
 to them as resource controls or resource limits.  

 I know that roughly covers what cpusets and beancounters and ckrm
 currently do.

let me tell you, it also covers what Linux-VServer does :)

 The real trick is that I believe these groupings are designed to
 be something you can setup on login and then not be able to switch
 out of. Which means we can't use sessions and process groups as the
 grouping entities as those have different semantics.

precisely, once you are inside a resource container, you
must not have the ability to modify its limits, and to
some degree, you should not know about the actual available
resources, but only about the artificial limits

HTC,
Herbert

 Eric
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl
On Wed, Mar 07, 2007 at 05:35:58PM -0800, Paul Menage wrote:
 On 3/7/07, Eric W. Biederman [EMAIL PROTECTED] wrote:
 Pretty much. For most of the other cases I think we are safe
 referring to them as resource controls or resource limits. I know
 that roughly covers what cpusets and beancounters and ckrm currently
 do.
 
 Plus resource monitoring (which may often be a subset of resource
 control/limits).

we (Linux-VServer) call that resource accounting, and
it is the first step to resource limits ...

  The real trick is that I believe these groupings are designed to be
  something you can setup on login and then not be able to switch out
  of.
 
 That's going to to be the case for most resource controllers - is that
 the case for namespaces? (e.g. can any task unshare say its mount
 namespace?)

ATM, yes, and there is no real harm in doing so
this would be a problem for resource containers,
unless they are strict hierarchical, i.e. only
allow to further restrict the existing resources
(which might cause some trouble if existing limits
have to be changed at some point)

best,
Herbert

 Paul
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl
On Wed, Mar 07, 2007 at 11:44:58PM -0700, Eric W. Biederman wrote:
 Matt Helsley [EMAIL PROTECTED] writes:
 
  On Thu, 2007-03-08 at 16:32 +-1300, Sam Vilain wrote:
 
  +ADw-snip+AD4
 
  +AD4 Kirill, 06032418:36+-03:
  +AD4 +AD4 I propose to use +ACI-namespace+ACI naming.
  +AD4 +AD4 1. This is already used in fs.
  +AD4 +AD4 2. This is what IMHO suites at least OpenVZ/Eric
  +AD4 +AD4 3. it has good acronym +ACI-ns+ACI.
  +AD4 
  +AD4 Right.  So, now I'll also throw into the mix:
  +AD4 
  +AD4   - resource groups (I get a strange feeling of d+AOk-j+AOA v+APo 
  there)
 
  +ADw-offtopic+AD4
  Re: d+AOk-j+AOA v+APo: yes+ACE
 
  It's like that Star Trek episode ... except we can't agree on the name
  of the impossible particle we will invent which solves all our problems.
  +ADw-/offtopic+AD4
 
  At the risk of prolonging the agony I hate to ask: are all of these
  groupings really concerned with +ACI-resources+ACI?
 
  +AD4   - supply chains (think supply and demand)
  +AD4   - accounting classes
 
  CKRM's use of the term +ACI-class+ACI drew negative comments from Paul 
  Jackson
  and Andrew Morton about this time last year. That led to my suggestion
  of +ACI-Resource Groups+ACI. Unless they've changed their minds...
 
  +AD4 Do any of those sound remotely close?  If not, your turn :)
 
  I'll butt in here: task groups? task sets? confuselets? +ADs)
 
 Generically we can use subsystem now for the individual pieces without
 confusing anyone.
 
 I really don't much care as long as we don't start redefining
 container as something else.  I think the IBM guys took it from
 solaris originally which seems to define a zone as a set of
 isolated processes (for us all separate namespaces).  And a container
 as a set of as a zone that uses resource control.  Not exactly how
 we have been using the term but close enough not to confuse someone.
 
 As long as we don't go calling the individual subsystems or the
 process groups they need to function a container I really don't care.
 
 I just know that if we use container for just the subsystem level
 it makes effective communication impossible, and code reviews
 essentially impossible.  As the description says one thing the
 reviewer reads it as another and then the patch does not match
 the description.  Leading to NAKs.
 
 Resource groups at least for subset of subsystems that aren't
 namespaces sounds reasonable.  Heck resource group, resource
 controller, resource subsystem, resource just about anything seems
 sane to me.
 
 The important part is that we find a vocabulary without doubly
 defined words so we can communicate and a small common set we can
 agree on so people can work on and implement the individual
 resource controllers/groups, and get the individual pieces merged
 as they are reading.

from my personal PoV the following would be fine:

 spaces (for the various 'spaces')

  - similar enough to the old namespace
  - can be easily used with prefix/postfix
like in pid_space, mnt_space, uts_space etc
  - AFAIK, it is not used yet for anything else

 container (for resource accounting/limits)

  - has the 'containment' principle built in :)
  - is used in similar ways in other solutions
  - sounds similar to context (easy to associate)

note: I'm also fine with other names, as long as
we find some useable vocabulary soon, as the
different terms start confusing me on a regular
basis, and we do not go for already used names,
which would clash with Linux-VServer or OpenVZ
terminology (which would confuse the hell out of
the end-users :)

best,
Herbert

 Eric
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Herbert Poetzl
On Thu, Mar 08, 2007 at 05:00:54PM +0530, Srivatsa Vaddagiri wrote:
 On Thu, Mar 08, 2007 at 01:50:01PM +1300, Sam Vilain wrote:
  7. resource namespaces
 
 It should be. Imagine giving 20% bandwidth to a user X. X wants to
 divide this bandwidth further between multi-media (10%), kernel
 compilation (5%) and rest (5%). So,

sounds quite nice, but ...

  Is the subservient namespace's resource usage counting against ours too?
 
 Yes, the resource usage of children should be accounted when capping
 parent resource usage.

it will require to do accounting many times
(and limit checks of course), which in itself
might be a way to DoS the kernel by creating
more and more resource groups
 
  Can we dynamically alter the subservient namespace's resource
  allocations?
 
 Should be possible yes. That lets user X completely manage his
 allocation among whatever sub-groups he creates.

what happens if the parent changes, how is
the resource change (if it was a reduction)
propagated to the children?

e.g. your guest has 1024 file handles, now
you reduce it to 512, but the guest had two
children, both with 256 file handles each ...

  So let's bring this back to your patches. If they are providing
  visibility of ns_proxy, then it should be called namesfs or some
  such.
 
 The patches should give visibility to both nsproxy objects (by showing
 what tasks share the same nsproxy objects and letting tasks move across
 nsproxy objects if allowed) and the resource control objects pointed to
 by nsproxy (struct cpuset, struct cpu_limit, struct rss_limit etc).

the nsproxy is not really relevant, as it
is some kind of strange indirection, which
does not necessarily depict the real relations,
regardless wether you do the re-sharing of
those nsproies or not .. let me know if you
need examples to verify that ...

best,
Herbert

  It doesn't really matter if processes disappear from namespace
  aggregates, because that's what's really happening anyway. The only
  problem is that if you try to freeze a namespace that has visibility
  of things at this level, you might not be able to reconstruct the
  filesystem in the same way. This may or may not be considered a
  problem, but open filehandles and directory handles etc surviving
  a freeze/thaw is part of what we're trying to achieve. Then again,
  perhaps some visibility is better than none for the time being.
  
  If they are restricted entirely to resource control, then don't use
  the nsproxy directly - use the structure or structures which hang
  off the nsproxy (or even task_struct) related to resource control.
 
 -- 
 Regards,
 vatsa
 ___
 Containers mailing list
 [EMAIL PROTECTED]
 https://lists.osdl.org/mailman/listinfo/containers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
 But namespace has well-established historical semantics too - a way
 of changing the mappings of local * to global objects. This
 accurately describes things liek resource controllers, cpusets, resource
 monitoring, etc.

No!

Cpusets don't rename or change the mapping of objects.

I suspect you seriously misunderstand cpusets and are trying to cram them
into a 'namespace' remapping role into which they don't fit.

So far as cpusets are concerned, CPU #17 is CPU #17, for all tasks,
regardless of what cpuset they are in.  They just might not happen to
be allowed to execute on CPU #17 at the moment, because that CPU is not
allowed by the cpuset they are in.

But they still call it CPU #17.

Similary the namespace of cpusets and of tasks (pid's) are single
system-wide namespaces, so far as cpusets are concerned.

Cpusets are not about alternative or multiple or variant name spaces.
They are about (considering just CPUs for the moment):
 1) creating a set of maps M0, M1, ... from the set of CPUs to a Boolean,
 2) creating a mapping Q from the set of tasks to these M0, ... maps, and
 3) imposing constraints on where tasks can run, as follows:
For any task t, that task is allowed to run on CPU x iff Q(t)(x)
is True.  Here, Q(t) will be one of the maps M0, ... aka a cpuset.

So far as cpusets are concerned, there is only one each of:
 A] a namespace numbering CPUs,
 B] a namespace numbering tasks (the process id),
 C] a namespace naming cpusets (the hierarchical name space normally
mounted at /dev/cpuset, and corresponding to the Mn maps above) and
 D] a mapping of tasks to cpusets, system wide (just a map, not a namespace.)

All tasks (of sufficient authority) can see each of these, using a single
system wide name space for each of [A], [B], and [C].

Unless, that is, you call any mapping a way of changing mappings.
To do so would be a senseless abuse of the phrase, in my view.

More generally, these resource managers all tend to divide some external
limited physical resource into multiple separately allocatable units.

If the resource is amorphous (one atom or cycle of it is interchangeable
with another) then we usually do something like divide it into 100
equal units and speak of percentages.  If the resource is naturally
subdivided into sufficiently small units (sufficient for the
granularity of resource management we require) then we take those units
as is.  Occassionally, as in the 'fake numa node' patch by David
Rientjes [EMAIL PROTECTED], who worked at Google over the
last summer, if the natural units are not of sufficient granularity, we
fake up a somewhat finer division.

Then, in any case, and somewhat separately, we divide the tasks running
on the system into subsets.  More precisely, we partition the tasks,
where a partition of a set is a set of subsets of that set, pairwise
disjoint, whose union equals that set.

Then, finally, we map the task subsets (partition element) to the
resource units, and add hooks in the kernel where this particular
resource is allocated or scheduled to constrain the tasks to only using
the units to which their task partition element is mapped.

These hooks are usually the 'interesting' part of a resource management
patch; one needs to minimize impact on both the kernel source code and
on the runtime performance, and for these hooks, that can be a
challenge.  In particular, what are naturally system wide resource
management stuctures cannot be allowed to impose system wide locks on
critical resource allocation code paths (and it's usually the most
critical resources, such as memory, cpu and network, that we most need
to manage in the first place.)

== This has nothing to do with remapping namespaces as I might use that
phrase though I cannot claim to be qualified enough to speak on behalf
of the Generally Established Principles of Computer Science.

I am as qualified as anyone to speak on behalf of cpusets, and I suspect
you are not accurately understanding them if you think of them as remapping
namespaces.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
 The real trick is that I believe these groupings are designed to be something
 you can setup on login and then not be able to switch out of.  Which means
 we can't use sessions and process groups as the grouping entities as those 
 have different semantics.

Not always on login.  For big administered systems, we use batch schedulers
to manage the placement of multiple jobs, submitted to a run queue by users,
onto the available compute resources.

But I agree with your conclusion - the existing task grouping mechanisms,
while useful for some purposes, don't meet the need here.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Jackson
Matt wrote:
 It's like that Star Trek episode ... except we can't agree on the name

Usually, when there is this much heat and smoke over a name, there is
really an underlying disagreement or misunderstanding over the meaning
of something.

The name becomes the proxy for meaning ;).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-08 Thread Paul Menage

On 3/7/07, Sam Vilain [EMAIL PROTECTED] wrote:


Ok, they share this characteristic with namespaces: that they group
processes.  So, they conceptually hang off task_struct.  But we put them
on ns_proxy because we've got this vague notion that things might be
better that way.


Remember that I'm not the one pushing to move them into ns_proxy.
These patches are all Srivatsa's work. Despite that fact that they say
Signed-off-by: Paul Menage, I'd never seen them before they were
posted to LKML, and I'm not sure that they're the right approach.
(Although some form of unification might be good).



 about this you still insist on calling this sub-system specific stuff
 the container,

 Uh, no. I'm trying to call a *grouping* of processes a container.


Ok, so is this going to supplant the namespaces too?


I don't know. It would be nice to have a single object hanging off the
task struct that contains all the various grouping pointers. Having
something that was flexible enough to handle all the required
behaviours, or else allowing completely different behaviours for
different subsets of that structure, could be the fiddly bit.

See my expanded reply to Eric' earlier post for a possible way of
unifying them, and simplifying the nsproxy and container.c code in the
process.



  - resource groups (I get a strange feeling of déjà vú there)


Resource groups isn't a terrible name for them (although I'd be
wondering whether the BeanCounters folks would object :-) ) but the
intention is that they're more generic than purely for resource
accounting. (E.g. see my other email where I suggested that things
like task-mempolicy and task-user could potentially be treated in
the same way)

Task Group is a good name, except for the fact that it's too easily
confused with process group.



And do we bother changing IPC namespaces or let that one slide?



I think that namespace is a fine term for the IPC id
virtualization/restriction that ipc_ns provides. (Unless I'm totally
misunderstanding the concept).

Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman
Matt Helsley <[EMAIL PROTECTED]> writes:

> On Thu, 2007-03-08 at 16:32 +-1300, Sam Vilain wrote:
>
> +ADw-snip+AD4
>
> +AD4 Kirill, 06032418:36+-03:
> +AD4 +AD4 I propose to use +ACI-namespace+ACI naming.
> +AD4 +AD4 1. This is already used in fs.
> +AD4 +AD4 2. This is what IMHO suites at least OpenVZ/Eric
> +AD4 +AD4 3. it has good acronym +ACI-ns+ACI.
> +AD4 
> +AD4 Right.  So, now I'll also throw into the mix:
> +AD4 
> +AD4   - resource groups (I get a strange feeling of d+AOk-j+AOA v+APo there)
>
> +ADw-offtopic+AD4
> Re: d+AOk-j+AOA v+APo: yes+ACE
>
> It's like that Star Trek episode ... except we can't agree on the name
> of the impossible particle we will invent which solves all our problems.
> +ADw-/offtopic+AD4
>
> At the risk of prolonging the agony I hate to ask: are all of these
> groupings really concerned with +ACI-resources+ACI?
>
> +AD4   - supply chains (think supply and demand)
> +AD4   - accounting classes
>
> CKRM's use of the term +ACI-class+ACI drew negative comments from Paul Jackson
> and Andrew Morton about this time last year. That led to my suggestion
> of +ACI-Resource Groups+ACI. Unless they've changed their minds...
>
> +AD4 Do any of those sound remotely close?  If not, your turn :)
>
> I'll butt in here: task groups? task sets? confuselets? +ADs)

Generically we can use subsystem now for the individual pieces without
confusing anyone.

I really don't much care as long as we don't start redefining
container as something else.  I think the IBM guys took it from
solaris originally which seems to define a zone as a set of
isolated processes (for us all separate namespaces).  And a container
as a set of as a zone that uses resource control.  Not exactly how
we have been using the term but close enough not to confuse someone.

As long as we don't go calling the individual subsystems or the
process groups they need to function a container I really don't care.

I just know that if we use container for just the subsystem level
it makes effective communication impossible, and code reviews
essentially impossible.  As the description says one thing the
reviewer reads it as another and then the patch does not match
the description.  Leading to NAKs.

Resource groups at least for subset of subsystems that aren't
namespaces sounds reasonable.  Heck resource group, resource
controller, resource subsystem, resource just about anything seems
sane to me.

The important part is that we find a vocabulary without doubly
defined words so we can communicate and a small common set we can
agree on so people can work on and implement the individual
resource controllers/groups, and get the individual pieces merged
as they are reading.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman
Sam Vilain <[EMAIL PROTECTED]> writes:

> And do we bother changing IPC namespaces or let that one slide?

ipc namespaces works (if you worry about tiny details like we put
the resource limits for the sysv ipc objects inside the namespace).

Probably the most instructive example of this is that you can you
map a sysv ipc shared memory segment with shmat and then switch to
another sysvipc namespace you still have access by reads and writes
to that shared memory segment but you cannot manipulate it because it
doesn't have a name.

Either that or look at the output of ipcs, before and after an unshare.

SYSVIPC really doesn't have it's own (very weird) set of global names
and that is essentially all the ipc namespace deals with.

I think you have the sysvipc namespace confused with something else
though (like signal sending).

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Matt Helsley
On Thu, 2007-03-08 at 16:32 +1300, Sam Vilain wrote:



> Kirill, 06032418:36+03:
> > I propose to use "namespace" naming.
> > 1. This is already used in fs.
> > 2. This is what IMHO suites at least OpenVZ/Eric
> > 3. it has good acronym "ns".
> 
> Right.  So, now I'll also throw into the mix:
> 
>   - resource groups (I get a strange feeling of déjà vú there)


Re: déjà vú: yes!

It's like that Star Trek episode ... except we can't agree on the name
of the impossible particle we will invent which solves all our problems.


At the risk of prolonging the agony I hate to ask: are all of these
groupings really concerned with "resources"?

>   - supply chains (think supply and demand)
>   - accounting classes

CKRM's use of the term "class" drew negative comments from Paul Jackson
and Andrew Morton about this time last year. That led to my suggestion
of "Resource Groups". Unless they've changed their minds...

> Do any of those sound remotely close?  If not, your turn :)

I'll butt in here: task groups? task sets? confuselets? ;)

Cheers,
-Matt Helsley

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain
Paul Menage wrote:
> I made sure to check [...]wikipedia.org[...] when this argument started ... 
> :-)
>   

Wikipedia?!  That's not a referen[...]

oh bugger it.  I've vented enough today and we're on the same page now I
think.

>> This is the classic terminology problem between substance and function.
>> ie, some things share characteristics but does that mean they are the
>> same thing?
>> 
>
> Aren't you arguing my side here? My point is that what I'm trying to
> add with "containers" (or whatever name we end up using) can't easily
> be subsumed into the "namespace" concept, and you're arguing that they
> should go into nsproxy because they share some characteristics.
>   

Ok, they share this characteristic with namespaces: that they group
processes.  So, they conceptually hang off task_struct.  But we put them
on ns_proxy because we've got this vague notion that things might be
better that way.

>> about this you still insist on calling this sub-system specific stuff
>> the "container",
>> 
> Uh, no. I'm trying to call a *grouping* of processes a container.
>   

Ok, so is this going to supplant the namespaces too?

>> and then go screaming that I am wrong and you are right
>> on terminology.
>> 
>
> Actually I asked if you/Eric had better suggestions.
>   

Cool, let's review them.

Me, 07921311:38+12:
> This would suggesting re-write this patchset, part 2 as a "CPUSet
> namespace", part 4 as a "CPU scheduling namespace", parts 5 and 6 as
> "Resource Limits Namespace" (drop this "BeanCounter" brand), and of
> course part 7 falls away.
Me, 07022110:58+12:
> Did you like the names I came up with in my original reply?
>  - CPUset namespace for CPU partitioning
>  - Resource namespaces:
>- cpusched namespace for CPU
>- ulimit namespace for memory
>- quota namespace for disk space
>- io namespace for disk activity
>- etc

Ok, there's nothing original or useful there; I'm obviously quite deliberately 
still punting on the issue.

Eric, 07030718:32-07:
> Pretty much.  For most of the other cases I think we are safe referring
> to them as resource controls or resource limits.I know that roughly
> covers what cpusets and beancounters and ckrm currently do.

Let's go back in time to the thread I referred to:

Me, 06032209:08+12 and nearby posts
>  - "vserver" spelt in full
>  - family
>  - container
>  - jail
>  - task_ns (sort for namespace)
> Using the term "box" and ID term "boxid":
> create_space - creates a new space and "hashes" it

Kirill, 06032418:36+03:
> I propose to use "namespace" naming.
> 1. This is already used in fs.
> 2. This is what IMHO suites at least OpenVZ/Eric
> 3. it has good acronym "ns".

Right.  So, now I'll also throw into the mix:

  - resource groups (I get a strange feeling of déjà vú there)
  - supply chains (think supply and demand)
  - accounting classes

Do any of those sound remotely close?  If not, your turn :)

And do we bother changing IPC namespaces or let that one slide?

Sam.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage

On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:


Sorry, I didn't realise I was talking with somebody qualified enough to
speak on behalf of the Generally Established Principles of Computer Science.


I made sure to check

http://en.wikipedia.org/wiki/Namespace
http://en.wikipedia.org/wiki/Namespace_%28computer_science%29

when this argument started ... :-)



This is the classic terminology problem between substance and function.
ie, some things share characteristics but does that mean they are the
same thing?


Aren't you arguing my side here? My point is that what I'm trying to
add with "containers" (or whatever name we end up using) can't easily
be subsumed into the "namespace" concept, and you're arguing that they
should go into nsproxy because they share some characteristics.



Look, I already agreed in the earlier thread that the term "namespace"
was being stretched beyond belief, yet instead of trying to be useful
about this you still insist on calling this sub-system specific stuff
the "container",


Uh, no. I'm trying to call a *grouping* of processes a container.


and then go screaming that I am wrong and you are right
on terminology.


Actually I asked if you/Eric had better suggestions.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Sam Vilain
Paul Menage wrote:
> Sorry, I think this statement is wrong, by the generally established
> meaning of the term namespace in computer science.
>   

Sorry, I didn't realise I was talking with somebody qualified enough to
speak on behalf of the Generally Established Principles of Computer Science.

>> Trying to extend the well-known term namespace to refer to thingsthat
>> are semantically equivalent namespaces is a useful approach, IMHO.
>>
>> 
> Yes, that would be true. But the kinds of groupings that we're talking
> about are supersets of namespaces, not semantically equivalent to
> them. To use Eric's "shoe" analogy from earlier, it's like insisting
> that we use the term "sneaker" to refer to all footware, including ski
> boots and birkenstocks ...
>   

I see it more like insisting that we use the term "clothing" to also
refer to "weapons" because for both of them you tell your body to "wear"
them in some game.

This is the classic terminology problem between substance and function. 
ie, some things share characteristics but does that mean they are the
same thing?

Look, I already agreed in the earlier thread that the term "namespace"
was being stretched beyond belief, yet instead of trying to be useful
about this you still insist on calling this sub-system specific stuff
the "container", and then go screaming that I am wrong and you are right
on terminology.

I've normally recognised[1] these three things as the primary feature
groups of vserver:

  - isolation
  - resource limiting
  - resource sharing

So I've got no problem with using "clothing" remaining for isolation and
"weapons" for resource sharing and limiting.  Or some other suitable terms.

Sam.

1. eg, http://utsl.gen.nz/talks/vserver/slide4c.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman
"Paul Menage" <[EMAIL PROTECTED]> writes:

> On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:
>> The real trick is that I believe these groupings are designed to be something
>> you can setup on login and then not be able to switch out of.
>
> That's going to to be the case for most resource controllers - is that
> the case for namespaces? (e.g. can any task unshare say its mount
> namespace?)

With namespaces there are secondary issues with unsharing.  Weird things
like a simple unshare might allow you to replace /etc/shadow and thus
mess up a suid root application.

Once people have worked through those secondary issues unsharing of
namespaces is likely allowable (for someone without CAP_SYS_ADMIN).
Although if you pick the truly hierarchical namespaces the pid
namespace unsharing will simply give you a parent of the current
namespace.

For resource controls I expect unsharing is likely to be like the pid
namespace.  You might allow it but if you do you are forced to be a
child and possible there will be hierarchy depth restrictions.
Assuming you can implement hierarchical accounting without to much
expense.



Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage

On 3/7/07, Eric W. Biederman <[EMAIL PROTECTED]> wrote:

Pretty much.  For most of the other cases I think we are safe referring
to them as resource controls or resource limits.I know that roughly covers
what cpusets and beancounters and ckrm currently do.


Plus resource monitoring (which may often be a subset of resource
control/limits).



The real trick is that I believe these groupings are designed to be something
you can setup on login and then not be able to switch out of.


That's going to to be the case for most resource controllers - is that
the case for namespaces? (e.g. can any task unshare say its mount
namespace?)

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Eric W. Biederman
"Paul Menage" <[EMAIL PROTECTED]> writes:

> On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:
>> But "namespace" has well-established historical semantics too - a way
>> of changing the mappings of local * to global objects. This
>> accurately describes things liek resource controllers, cpusets, resource
>> monitoring, etc.
>
> Sorry, I think this statement is wrong, by the generally established
> meaning of the term namespace in computer science.
>
>>
>> Trying to extend the well-known term namespace to refer to things that
>> are semantically equivalent namespaces is a useful approach, IMHO.
>>
>
> Yes, that would be true. But the kinds of groupings that we're talking
> about are supersets of namespaces, not semantically equivalent to
> them. To use Eric's "shoe" analogy from earlier, it's like insisting
> that we use the term "sneaker" to refer to all footware, including ski
> boots and birkenstocks ...

Pretty much.  For most of the other cases I think we are safe referring
to them as resource controls or resource limits.I know that roughly covers
what cpusets and beancounters and ckrm currently do.

The real trick is that I believe these groupings are designed to be something
you can setup on login and then not be able to switch out of.  Which means
we can't use sessions and process groups as the grouping entities as those 
have different semantics.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Paul Menage

On 3/7/07, Sam Vilain <[EMAIL PROTECTED]> wrote:

But "namespace" has well-established historical semantics too - a way
of changing the mappings of local * to global objects. This
accurately describes things liek resource controllers, cpusets, resource
monitoring, etc.


Sorry, I think this statement is wrong, by the generally established
meaning of the term namespace in computer science.



Trying to extend the well-known term namespace to refer to things that
are semantically equivalent namespaces is a useful approach, IMHO.



Yes, that would be true. But the kinds of groupings that we're talking
about are supersets of namespaces, not semantically equivalent to
them. To use Eric's "shoe" analogy from earlier, it's like insisting
that we use the term "sneaker" to refer to all footware, including ski
boots and birkenstocks ...

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >