Re: [ckrm-tech] [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-03-24 Thread Srivatsa Vaddagiri
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote:
> +static int attach_task(struct container *cont, struct task_struct *tsk)
>  {

[snip]

> + for_each_subsys(h, ss) {
> + if (ss->can_attach) {
> + retval = ss->can_attach(ss, cont, tsk);
> + if (retval) {
> + put_task_struct(tsk);

We don't need this put_task_struct(), since our caller
attach_task_by_pid() would do it for us.

> + return retval;
> + }
> + }
>   }
> -}
> 
> + /* Locate or allocate a new container_group for this task,
> +  * based on its final set of containers */
> + oldcg = tsk->containers;
> + newcg = find_container_group(oldcg, cont);
> + if (!newcg) {
> + put_task_struct(tsk);

Ditto

> + return -ENOMEM;
> + }

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ckrm-tech] [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-03-24 Thread Srivatsa Vaddagiri
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote:
 +static int attach_task(struct container *cont, struct task_struct *tsk)
  {

[snip]

 + for_each_subsys(h, ss) {
 + if (ss-can_attach) {
 + retval = ss-can_attach(ss, cont, tsk);
 + if (retval) {
 + put_task_struct(tsk);

We don't need this put_task_struct(), since our caller
attach_task_by_pid() would do it for us.

 + return retval;
 + }
 + }
   }
 -}
 
 + /* Locate or allocate a new container_group for this task,
 +  * based on its final set of containers */
 + oldcg = tsk-containers;
 + newcg = find_container_group(oldcg, cont);
 + if (!newcg) {
 + put_task_struct(tsk);

Ditto

 + return -ENOMEM;
 + }

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-03-08 Thread Srivatsa Vaddagiri
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote:
> +static struct container_group *find_container_group(
> + struct container_group *oldcg, struct container *cont)
> +{
> + struct container_group *res;
> + struct container_subsys *ss;
> + int h = cont->hierarchy;
> + int i;
> +
> + BUG_ON(oldcg->container[h] == cont);
> + /* First see if we already have a container group that matches
> +  * the desired set */
> + spin_lock(_group_lock);

I think this should be spin_lock_irq().

I have been bitten by lockdep warning in the rcfs patchs. But I think
they should apply here as well.

Essentially, container_task_count() is called from cpuset.c after doing
write_lock_irq(tasklist_lock). This creates the dependency:

tasklist_lock (irqs disabled)  -> container_group_lock

However in above function, find_container_group (and several other
functions) we take container_group_lock w/o blocking interrupts. An
interrupt handler now trying to take tasklist_lock will cause a
deadlock.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-03-08 Thread Srivatsa Vaddagiri
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote:
 +static struct container_group *find_container_group(
 + struct container_group *oldcg, struct container *cont)
 +{
 + struct container_group *res;
 + struct container_subsys *ss;
 + int h = cont-hierarchy;
 + int i;
 +
 + BUG_ON(oldcg-container[h] == cont);
 + /* First see if we already have a container group that matches
 +  * the desired set */
 + spin_lock(container_group_lock);

I think this should be spin_lock_irq().

I have been bitten by lockdep warning in the rcfs patchs. But I think
they should apply here as well.

Essentially, container_task_count() is called from cpuset.c after doing
write_lock_irq(tasklist_lock). This creates the dependency:

tasklist_lock (irqs disabled)  - container_group_lock

However in above function, find_container_group (and several other
functions) we take container_group_lock w/o blocking interrupts. An
interrupt handler now trying to take tasklist_lock will cause a
deadlock.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-14 Thread Paul Menage

On 2/13/07, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:


Well, we already bump up reference count in fork() w/o grabbing those
mutexes don't we? Also if rmdir() sees container->count to be zero, then
it means no task is attached to the container. How will then a function
like bc_file_charge() bump up the reference count to such a container
(presuming it wanted to do so w/o manage/callback mutexes -and- that the
container pointer in bc_file_charge is derived from some task in
that container). I think it is safe to bump up container->count in
bc_file_charge w/o grabbing manage/callback mutexes.


Right, I was never suggesting that we take either of those mutexes for
this operation. The spin lock in css_get() was an attempt to avoid
that. But I think you're right that it was too heavyweight, and can be
avoided with atomic operations. See my other email to Pavel.



Are you talking about (un)bind of subsystem to/from hierararchies that
have non-zero containers in them? That sounds very icky. Anyway that
doesnt seem to be supported in current patches.


The bind/unbind from active hierarchies is supported in the user-space
API, and it's implemented for hierarchies that have no child
containers. Hence it's important, at least conceptually, for the
reference count to be held by the subsystem state rather than the
container.

Implementing a full bind/unbind for arbitrary subsystems and
hierarchies will indeed be a lot of work, which is why I'm not trying
to do it at this point.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-14 Thread Balbir Singh

[EMAIL PROTECTED] wrote:

This patch removes all cpuset-specific knowlege from the container
system, replacing it with a generic API that can be used by multiple
subsystems. Cpusets is adapted to be a container subsystem.

Signed-off-by: Paul Menage <[EMAIL PROTECTED]>



Hi, Paul,

This patch fails to apply for me

[EMAIL PROTECTED]:~/ocbalbir/images/kernels/containers/linux-2.6.20$ 
pushpatch

patching file include/linux/container.h
patching file include/linux/cpuset.h
patching file kernel/container.c
patch:  malformed patch at line 640: @@ -202,15 +418,18 @@ static 
DEFINE_MUTEX(callback_mutex);


multiuser_container does not apply

Is anybody else seeing this problem?

--
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-14 Thread Balbir Singh

[EMAIL PROTECTED] wrote:

This patch removes all cpuset-specific knowlege from the container
system, replacing it with a generic API that can be used by multiple
subsystems. Cpusets is adapted to be a container subsystem.

Signed-off-by: Paul Menage [EMAIL PROTECTED]



Hi, Paul,

This patch fails to apply for me

[EMAIL PROTECTED]:~/ocbalbir/images/kernels/containers/linux-2.6.20$ 
pushpatch

patching file include/linux/container.h
patching file include/linux/cpuset.h
patching file kernel/container.c
patch:  malformed patch at line 640: @@ -202,15 +418,18 @@ static 
DEFINE_MUTEX(callback_mutex);


multiuser_container does not apply

Is anybody else seeing this problem?

--
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-14 Thread Paul Menage

On 2/13/07, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:


Well, we already bump up reference count in fork() w/o grabbing those
mutexes don't we? Also if rmdir() sees container-count to be zero, then
it means no task is attached to the container. How will then a function
like bc_file_charge() bump up the reference count to such a container
(presuming it wanted to do so w/o manage/callback mutexes -and- that the
container pointer in bc_file_charge is derived from some task in
that container). I think it is safe to bump up container-count in
bc_file_charge w/o grabbing manage/callback mutexes.


Right, I was never suggesting that we take either of those mutexes for
this operation. The spin lock in css_get() was an attempt to avoid
that. But I think you're right that it was too heavyweight, and can be
avoided with atomic operations. See my other email to Pavel.



Are you talking about (un)bind of subsystem to/from hierararchies that
have non-zero containers in them? That sounds very icky. Anyway that
doesnt seem to be supported in current patches.


The bind/unbind from active hierarchies is supported in the user-space
API, and it's implemented for hierarchies that have no child
containers. Hence it's important, at least conceptually, for the
reference count to be held by the subsystem state rather than the
container.

Implementing a full bind/unbind for arbitrary subsystems and
hierarchies will indeed be a lot of work, which is why I'm not trying
to do it at this point.

Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-13 Thread Srivatsa Vaddagiri
On Mon, Feb 12, 2007 at 10:40:52AM -0800, Paul Menage wrote:
> I did consider that approach at one point. The reason I rejected it
> was that then container->count would no longer even vaguely represent
> the number of processes in a container. Now that we have the
> container_group object, we have to use that for counting the number of
> processes in a container anyway, so that objection goes away.

Yep.

> However, I think it's important to be able to provide some kind of a
> reference count that subsystems can grab (e.g. to store a reference in
> a non-task object such as a file struct) without taking manage_mutex
> or callback_mutex (since that would be excessively heavyweight) but
> which can still be "frozen" at zero at the point when you're trying to
> destroy a container. 

Well, we already bump up reference count in fork() w/o grabbing those
mutexes don't we? Also if rmdir() sees container->count to be zero, then
it means no task is attached to the container. How will then a function
like bc_file_charge() bump up the reference count to such a container
(presuming it wanted to do so w/o manage/callback mutexes -and- that the
container pointer in bc_file_charge is derived from some task in 
that container). I think it is safe to bump up container->count in
bc_file_charge w/o grabbing manage/callback mutexes.

> Additionally, having it per subsystem will be
> important for when we implement arbitrary binding/unbinding of
> subsystems from hierarchies - at that point we need to be able know
> which subsystems have external reference counts, and hence aren't
> removeable.

Are you talking about (un)bind of subsystem to/from hierararchies that
have non-zero containers in them? That sounds very icky. Anyway that
doesnt seem to be supported in current patches.

Basically I felt we should defer introducing css_get/put until we find a good 
user for it, (and bc_file_(un)charge dont seem to be good users of it-
see above).



-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-13 Thread Srivatsa Vaddagiri
On Mon, Feb 12, 2007 at 10:40:52AM -0800, Paul Menage wrote:
 I did consider that approach at one point. The reason I rejected it
 was that then container-count would no longer even vaguely represent
 the number of processes in a container. Now that we have the
 container_group object, we have to use that for counting the number of
 processes in a container anyway, so that objection goes away.

Yep.

 However, I think it's important to be able to provide some kind of a
 reference count that subsystems can grab (e.g. to store a reference in
 a non-task object such as a file struct) without taking manage_mutex
 or callback_mutex (since that would be excessively heavyweight) but
 which can still be frozen at zero at the point when you're trying to
 destroy a container. 

Well, we already bump up reference count in fork() w/o grabbing those
mutexes don't we? Also if rmdir() sees container-count to be zero, then
it means no task is attached to the container. How will then a function
like bc_file_charge() bump up the reference count to such a container
(presuming it wanted to do so w/o manage/callback mutexes -and- that the
container pointer in bc_file_charge is derived from some task in 
that container). I think it is safe to bump up container-count in
bc_file_charge w/o grabbing manage/callback mutexes.

 Additionally, having it per subsystem will be
 important for when we implement arbitrary binding/unbinding of
 subsystems from hierarchies - at that point we need to be able know
 which subsystems have external reference counts, and hence aren't
 removeable.

Are you talking about (un)bind of subsystem to/from hierararchies that
have non-zero containers in them? That sounds very icky. Anyway that
doesnt seem to be supported in current patches.

Basically I felt we should defer introducing css_get/put until we find a good 
user for it, (and bc_file_(un)charge dont seem to be good users of it-
see above).



-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-12 Thread Paul Menage

On 2/12/07, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote:

On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote:
> +/*
> + * Call css_get() to hold a reference on the container; following a
> + * return of 0, this container subsystem state object is guaranteed
> + * not to be destroyed until css_put() is called on it.  A non-zero
> + * return code indicates that a reference could not be taken.
> + *
> + */
> +

Why can't we reuse container->count (or container_group->ref) to
refcount the per-subsystem object attached to a container? I think
that is how it is done for cpusets? That would make css_get/put
unnecessary?


I did consider that approach at one point. The reason I rejected it
was that then container->count would no longer even vaguely represent
the number of processes in a container. Now that we have the
container_group object, we have to use that for counting the number of
processes in a container anyway, so that objection goes away.

However, I think it's important to be able to provide some kind of a
reference count that subsystems can grab (e.g. to store a reference in
a non-task object such as a file struct) without taking manage_mutex
or callback_mutex (since that would be excessively heavyweight) but
which can still be "frozen" at zero at the point when you're trying to
destroy a container. Additionally, having it per subsystem will be
important for when we implement arbitrary binding/unbinding of
subsystems from hierarchies - at that point we need to be able know
which subsystems have external reference counts, and hence aren't
removeable.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-12 Thread Paul Menage

On 2/12/07, Cedric Le Goater <[EMAIL PROTECTED]> wrote:

>> +#include 
>
> I did have a problem with this include.  On s390 it didn't exist so I've
> just been running without it (with no problems).  A quick 'find'
> suggests it only exists on x86_64, so I'd expect failures on all other
> arches.

confirmed on x86 also.



Sorry, this turned out to be left over from when I was debugging an
initialization problem that required early_printk(). I should have
removed it.

You can safely remove it from the patch.

Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-12 Thread Cedric Le Goater
>> +#include 
> 
> I did have a problem with this include.  On s390 it didn't exist so I've
> just been running without it (with no problems).  A quick 'find'
> suggests it only exists on x86_64, so I'd expect failures on all other
> arches.

confirmed on x86 also.

C.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-12 Thread Srivatsa Vaddagiri
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote:
> +/*
> + * Call css_get() to hold a reference on the container; following a
> + * return of 0, this container subsystem state object is guaranteed
> + * not to be destroyed until css_put() is called on it.  A non-zero
> + * return code indicates that a reference could not be taken.
> + *
> + */
> +

Why can't we reuse container->count (or container_group->ref) to
refcount the per-subsystem object attached to a container? I think
that is how it is done for cpusets? That would make css_get/put
unnecessary?

> +static inline int css_get(struct container_subsys_state *css)
> +{
> + int retval = 0;
> + unsigned long flags;
> + /* Synchronize with container_rmdir() */
> + spin_lock_irqsave(>refcnt_lock, flags);
> + if (atomic_read(>refcnt) >= 0) {
> + /* Container is still alive */
> + atomic_inc(>refcnt);
> + } else {
> + /* Container removal is in progress */
> + retval = -EINVAL;
> + }
> + spin_unlock_irqrestore(>refcnt_lock, flags);
> + return retval;
> +}

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-12 Thread Cedric Le Goater
 +#include asm/proto.h
 
 I did have a problem with this include.  On s390 it didn't exist so I've
 just been running without it (with no problems).  A quick 'find'
 suggests it only exists on x86_64, so I'd expect failures on all other
 arches.

confirmed on x86 also.

C.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-12 Thread Paul Menage

On 2/12/07, Cedric Le Goater [EMAIL PROTECTED] wrote:

 +#include asm/proto.h

 I did have a problem with this include.  On s390 it didn't exist so I've
 just been running without it (with no problems).  A quick 'find'
 suggests it only exists on x86_64, so I'd expect failures on all other
 arches.

confirmed on x86 also.



Sorry, this turned out to be left over from when I was debugging an
initialization problem that required early_printk(). I should have
removed it.

You can safely remove it from the patch.

Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-12 Thread Srivatsa Vaddagiri
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote:
 +/*
 + * Call css_get() to hold a reference on the container; following a
 + * return of 0, this container subsystem state object is guaranteed
 + * not to be destroyed until css_put() is called on it.  A non-zero
 + * return code indicates that a reference could not be taken.
 + *
 + */
 +

Why can't we reuse container-count (or container_group-ref) to
refcount the per-subsystem object attached to a container? I think
that is how it is done for cpusets? That would make css_get/put
unnecessary?

 +static inline int css_get(struct container_subsys_state *css)
 +{
 + int retval = 0;
 + unsigned long flags;
 + /* Synchronize with container_rmdir() */
 + spin_lock_irqsave(css-refcnt_lock, flags);
 + if (atomic_read(css-refcnt) = 0) {
 + /* Container is still alive */
 + atomic_inc(css-refcnt);
 + } else {
 + /* Container removal is in progress */
 + retval = -EINVAL;
 + }
 + spin_unlock_irqrestore(css-refcnt_lock, flags);
 + return retval;
 +}

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers

2007-02-12 Thread Paul Menage

On 2/12/07, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote:

On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote:
 +/*
 + * Call css_get() to hold a reference on the container; following a
 + * return of 0, this container subsystem state object is guaranteed
 + * not to be destroyed until css_put() is called on it.  A non-zero
 + * return code indicates that a reference could not be taken.
 + *
 + */
 +

Why can't we reuse container-count (or container_group-ref) to
refcount the per-subsystem object attached to a container? I think
that is how it is done for cpusets? That would make css_get/put
unnecessary?


I did consider that approach at one point. The reason I rejected it
was that then container-count would no longer even vaguely represent
the number of processes in a container. Now that we have the
container_group object, we have to use that for counting the number of
processes in a container anyway, so that objection goes away.

However, I think it's important to be able to provide some kind of a
reference count that subsystems can grab (e.g. to store a reference in
a non-task object such as a file struct) without taking manage_mutex
or callback_mutex (since that would be excessively heavyweight) but
which can still be frozen at zero at the point when you're trying to
destroy a container. Additionally, having it per subsystem will be
important for when we implement arbitrary binding/unbinding of
subsystems from hierarchies - at that point we need to be able know
which subsystems have external reference counts, and hence aren't
removeable.

Paul
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/