Re: [ckrm-tech] [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote: > +static int attach_task(struct container *cont, struct task_struct *tsk) > { [snip] > + for_each_subsys(h, ss) { > + if (ss->can_attach) { > + retval = ss->can_attach(ss, cont, tsk); > + if (retval) { > + put_task_struct(tsk); We don't need this put_task_struct(), since our caller attach_task_by_pid() would do it for us. > + return retval; > + } > + } > } > -} > > + /* Locate or allocate a new container_group for this task, > + * based on its final set of containers */ > + oldcg = tsk->containers; > + newcg = find_container_group(oldcg, cont); > + if (!newcg) { > + put_task_struct(tsk); Ditto > + return -ENOMEM; > + } -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote: +static int attach_task(struct container *cont, struct task_struct *tsk) { [snip] + for_each_subsys(h, ss) { + if (ss-can_attach) { + retval = ss-can_attach(ss, cont, tsk); + if (retval) { + put_task_struct(tsk); We don't need this put_task_struct(), since our caller attach_task_by_pid() would do it for us. + return retval; + } + } } -} + /* Locate or allocate a new container_group for this task, + * based on its final set of containers */ + oldcg = tsk-containers; + newcg = find_container_group(oldcg, cont); + if (!newcg) { + put_task_struct(tsk); Ditto + return -ENOMEM; + } -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote: > +static struct container_group *find_container_group( > + struct container_group *oldcg, struct container *cont) > +{ > + struct container_group *res; > + struct container_subsys *ss; > + int h = cont->hierarchy; > + int i; > + > + BUG_ON(oldcg->container[h] == cont); > + /* First see if we already have a container group that matches > + * the desired set */ > + spin_lock(_group_lock); I think this should be spin_lock_irq(). I have been bitten by lockdep warning in the rcfs patchs. But I think they should apply here as well. Essentially, container_task_count() is called from cpuset.c after doing write_lock_irq(tasklist_lock). This creates the dependency: tasklist_lock (irqs disabled) -> container_group_lock However in above function, find_container_group (and several other functions) we take container_group_lock w/o blocking interrupts. An interrupt handler now trying to take tasklist_lock will cause a deadlock. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote: +static struct container_group *find_container_group( + struct container_group *oldcg, struct container *cont) +{ + struct container_group *res; + struct container_subsys *ss; + int h = cont-hierarchy; + int i; + + BUG_ON(oldcg-container[h] == cont); + /* First see if we already have a container group that matches + * the desired set */ + spin_lock(container_group_lock); I think this should be spin_lock_irq(). I have been bitten by lockdep warning in the rcfs patchs. But I think they should apply here as well. Essentially, container_task_count() is called from cpuset.c after doing write_lock_irq(tasklist_lock). This creates the dependency: tasklist_lock (irqs disabled) - container_group_lock However in above function, find_container_group (and several other functions) we take container_group_lock w/o blocking interrupts. An interrupt handler now trying to take tasklist_lock will cause a deadlock. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On 2/13/07, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote: Well, we already bump up reference count in fork() w/o grabbing those mutexes don't we? Also if rmdir() sees container->count to be zero, then it means no task is attached to the container. How will then a function like bc_file_charge() bump up the reference count to such a container (presuming it wanted to do so w/o manage/callback mutexes -and- that the container pointer in bc_file_charge is derived from some task in that container). I think it is safe to bump up container->count in bc_file_charge w/o grabbing manage/callback mutexes. Right, I was never suggesting that we take either of those mutexes for this operation. The spin lock in css_get() was an attempt to avoid that. But I think you're right that it was too heavyweight, and can be avoided with atomic operations. See my other email to Pavel. Are you talking about (un)bind of subsystem to/from hierararchies that have non-zero containers in them? That sounds very icky. Anyway that doesnt seem to be supported in current patches. The bind/unbind from active hierarchies is supported in the user-space API, and it's implemented for hierarchies that have no child containers. Hence it's important, at least conceptually, for the reference count to be held by the subsystem state rather than the container. Implementing a full bind/unbind for arbitrary subsystems and hierarchies will indeed be a lot of work, which is why I'm not trying to do it at this point. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
[EMAIL PROTECTED] wrote: This patch removes all cpuset-specific knowlege from the container system, replacing it with a generic API that can be used by multiple subsystems. Cpusets is adapted to be a container subsystem. Signed-off-by: Paul Menage <[EMAIL PROTECTED]> Hi, Paul, This patch fails to apply for me [EMAIL PROTECTED]:~/ocbalbir/images/kernels/containers/linux-2.6.20$ pushpatch patching file include/linux/container.h patching file include/linux/cpuset.h patching file kernel/container.c patch: malformed patch at line 640: @@ -202,15 +418,18 @@ static DEFINE_MUTEX(callback_mutex); multiuser_container does not apply Is anybody else seeing this problem? -- Balbir Singh Linux Technology Center IBM, ISTL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
[EMAIL PROTECTED] wrote: This patch removes all cpuset-specific knowlege from the container system, replacing it with a generic API that can be used by multiple subsystems. Cpusets is adapted to be a container subsystem. Signed-off-by: Paul Menage [EMAIL PROTECTED] Hi, Paul, This patch fails to apply for me [EMAIL PROTECTED]:~/ocbalbir/images/kernels/containers/linux-2.6.20$ pushpatch patching file include/linux/container.h patching file include/linux/cpuset.h patching file kernel/container.c patch: malformed patch at line 640: @@ -202,15 +418,18 @@ static DEFINE_MUTEX(callback_mutex); multiuser_container does not apply Is anybody else seeing this problem? -- Balbir Singh Linux Technology Center IBM, ISTL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On 2/13/07, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote: Well, we already bump up reference count in fork() w/o grabbing those mutexes don't we? Also if rmdir() sees container-count to be zero, then it means no task is attached to the container. How will then a function like bc_file_charge() bump up the reference count to such a container (presuming it wanted to do so w/o manage/callback mutexes -and- that the container pointer in bc_file_charge is derived from some task in that container). I think it is safe to bump up container-count in bc_file_charge w/o grabbing manage/callback mutexes. Right, I was never suggesting that we take either of those mutexes for this operation. The spin lock in css_get() was an attempt to avoid that. But I think you're right that it was too heavyweight, and can be avoided with atomic operations. See my other email to Pavel. Are you talking about (un)bind of subsystem to/from hierararchies that have non-zero containers in them? That sounds very icky. Anyway that doesnt seem to be supported in current patches. The bind/unbind from active hierarchies is supported in the user-space API, and it's implemented for hierarchies that have no child containers. Hence it's important, at least conceptually, for the reference count to be held by the subsystem state rather than the container. Implementing a full bind/unbind for arbitrary subsystems and hierarchies will indeed be a lot of work, which is why I'm not trying to do it at this point. Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On Mon, Feb 12, 2007 at 10:40:52AM -0800, Paul Menage wrote: > I did consider that approach at one point. The reason I rejected it > was that then container->count would no longer even vaguely represent > the number of processes in a container. Now that we have the > container_group object, we have to use that for counting the number of > processes in a container anyway, so that objection goes away. Yep. > However, I think it's important to be able to provide some kind of a > reference count that subsystems can grab (e.g. to store a reference in > a non-task object such as a file struct) without taking manage_mutex > or callback_mutex (since that would be excessively heavyweight) but > which can still be "frozen" at zero at the point when you're trying to > destroy a container. Well, we already bump up reference count in fork() w/o grabbing those mutexes don't we? Also if rmdir() sees container->count to be zero, then it means no task is attached to the container. How will then a function like bc_file_charge() bump up the reference count to such a container (presuming it wanted to do so w/o manage/callback mutexes -and- that the container pointer in bc_file_charge is derived from some task in that container). I think it is safe to bump up container->count in bc_file_charge w/o grabbing manage/callback mutexes. > Additionally, having it per subsystem will be > important for when we implement arbitrary binding/unbinding of > subsystems from hierarchies - at that point we need to be able know > which subsystems have external reference counts, and hence aren't > removeable. Are you talking about (un)bind of subsystem to/from hierararchies that have non-zero containers in them? That sounds very icky. Anyway that doesnt seem to be supported in current patches. Basically I felt we should defer introducing css_get/put until we find a good user for it, (and bc_file_(un)charge dont seem to be good users of it- see above). -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On Mon, Feb 12, 2007 at 10:40:52AM -0800, Paul Menage wrote: I did consider that approach at one point. The reason I rejected it was that then container-count would no longer even vaguely represent the number of processes in a container. Now that we have the container_group object, we have to use that for counting the number of processes in a container anyway, so that objection goes away. Yep. However, I think it's important to be able to provide some kind of a reference count that subsystems can grab (e.g. to store a reference in a non-task object such as a file struct) without taking manage_mutex or callback_mutex (since that would be excessively heavyweight) but which can still be frozen at zero at the point when you're trying to destroy a container. Well, we already bump up reference count in fork() w/o grabbing those mutexes don't we? Also if rmdir() sees container-count to be zero, then it means no task is attached to the container. How will then a function like bc_file_charge() bump up the reference count to such a container (presuming it wanted to do so w/o manage/callback mutexes -and- that the container pointer in bc_file_charge is derived from some task in that container). I think it is safe to bump up container-count in bc_file_charge w/o grabbing manage/callback mutexes. Additionally, having it per subsystem will be important for when we implement arbitrary binding/unbinding of subsystems from hierarchies - at that point we need to be able know which subsystems have external reference counts, and hence aren't removeable. Are you talking about (un)bind of subsystem to/from hierararchies that have non-zero containers in them? That sounds very icky. Anyway that doesnt seem to be supported in current patches. Basically I felt we should defer introducing css_get/put until we find a good user for it, (and bc_file_(un)charge dont seem to be good users of it- see above). -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On 2/12/07, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote: On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote: > +/* > + * Call css_get() to hold a reference on the container; following a > + * return of 0, this container subsystem state object is guaranteed > + * not to be destroyed until css_put() is called on it. A non-zero > + * return code indicates that a reference could not be taken. > + * > + */ > + Why can't we reuse container->count (or container_group->ref) to refcount the per-subsystem object attached to a container? I think that is how it is done for cpusets? That would make css_get/put unnecessary? I did consider that approach at one point. The reason I rejected it was that then container->count would no longer even vaguely represent the number of processes in a container. Now that we have the container_group object, we have to use that for counting the number of processes in a container anyway, so that objection goes away. However, I think it's important to be able to provide some kind of a reference count that subsystems can grab (e.g. to store a reference in a non-task object such as a file struct) without taking manage_mutex or callback_mutex (since that would be excessively heavyweight) but which can still be "frozen" at zero at the point when you're trying to destroy a container. Additionally, having it per subsystem will be important for when we implement arbitrary binding/unbinding of subsystems from hierarchies - at that point we need to be able know which subsystems have external reference counts, and hence aren't removeable. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On 2/12/07, Cedric Le Goater <[EMAIL PROTECTED]> wrote: >> +#include > > I did have a problem with this include. On s390 it didn't exist so I've > just been running without it (with no problems). A quick 'find' > suggests it only exists on x86_64, so I'd expect failures on all other > arches. confirmed on x86 also. Sorry, this turned out to be left over from when I was debugging an initialization problem that required early_printk(). I should have removed it. You can safely remove it from the patch. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
>> +#include > > I did have a problem with this include. On s390 it didn't exist so I've > just been running without it (with no problems). A quick 'find' > suggests it only exists on x86_64, so I'd expect failures on all other > arches. confirmed on x86 also. C. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote: > +/* > + * Call css_get() to hold a reference on the container; following a > + * return of 0, this container subsystem state object is guaranteed > + * not to be destroyed until css_put() is called on it. A non-zero > + * return code indicates that a reference could not be taken. > + * > + */ > + Why can't we reuse container->count (or container_group->ref) to refcount the per-subsystem object attached to a container? I think that is how it is done for cpusets? That would make css_get/put unnecessary? > +static inline int css_get(struct container_subsys_state *css) > +{ > + int retval = 0; > + unsigned long flags; > + /* Synchronize with container_rmdir() */ > + spin_lock_irqsave(>refcnt_lock, flags); > + if (atomic_read(>refcnt) >= 0) { > + /* Container is still alive */ > + atomic_inc(>refcnt); > + } else { > + /* Container removal is in progress */ > + retval = -EINVAL; > + } > + spin_unlock_irqrestore(>refcnt_lock, flags); > + return retval; > +} -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
+#include asm/proto.h I did have a problem with this include. On s390 it didn't exist so I've just been running without it (with no problems). A quick 'find' suggests it only exists on x86_64, so I'd expect failures on all other arches. confirmed on x86 also. C. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On 2/12/07, Cedric Le Goater [EMAIL PROTECTED] wrote: +#include asm/proto.h I did have a problem with this include. On s390 it didn't exist so I've just been running without it (with no problems). A quick 'find' suggests it only exists on x86_64, so I'd expect failures on all other arches. confirmed on x86 also. Sorry, this turned out to be left over from when I was debugging an initialization problem that required early_printk(). I should have removed it. You can safely remove it from the patch. Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote: +/* + * Call css_get() to hold a reference on the container; following a + * return of 0, this container subsystem state object is guaranteed + * not to be destroyed until css_put() is called on it. A non-zero + * return code indicates that a reference could not be taken. + * + */ + Why can't we reuse container-count (or container_group-ref) to refcount the per-subsystem object attached to a container? I think that is how it is done for cpusets? That would make css_get/put unnecessary? +static inline int css_get(struct container_subsys_state *css) +{ + int retval = 0; + unsigned long flags; + /* Synchronize with container_rmdir() */ + spin_lock_irqsave(css-refcnt_lock, flags); + if (atomic_read(css-refcnt) = 0) { + /* Container is still alive */ + atomic_inc(css-refcnt); + } else { + /* Container removal is in progress */ + retval = -EINVAL; + } + spin_unlock_irqrestore(css-refcnt_lock, flags); + return retval; +} -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] containers (V7): Add generic multi-subsystem API to containers
On 2/12/07, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote: On Mon, Feb 12, 2007 at 12:15:24AM -0800, [EMAIL PROTECTED] wrote: +/* + * Call css_get() to hold a reference on the container; following a + * return of 0, this container subsystem state object is guaranteed + * not to be destroyed until css_put() is called on it. A non-zero + * return code indicates that a reference could not be taken. + * + */ + Why can't we reuse container-count (or container_group-ref) to refcount the per-subsystem object attached to a container? I think that is how it is done for cpusets? That would make css_get/put unnecessary? I did consider that approach at one point. The reason I rejected it was that then container-count would no longer even vaguely represent the number of processes in a container. Now that we have the container_group object, we have to use that for counting the number of processes in a container anyway, so that objection goes away. However, I think it's important to be able to provide some kind of a reference count that subsystems can grab (e.g. to store a reference in a non-task object such as a file struct) without taking manage_mutex or callback_mutex (since that would be excessively heavyweight) but which can still be frozen at zero at the point when you're trying to destroy a container. Additionally, having it per subsystem will be important for when we implement arbitrary binding/unbinding of subsystems from hierarchies - at that point we need to be able know which subsystems have external reference counts, and hence aren't removeable. Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/