Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sat, Mar 24, 2007 at 10:43:48PM -0700, Paul Jackson wrote: > I'm unsure here, but this 'tsk->cpuset == cs' test feels fragile to me. > > How about a bit earlier in attach_task(), right at the point we overwrite the > victim tasks cpuset pointer, we decrement the count on the old cpuset, and if > it went to zero, remember that we'll need to release it, once we've dropped > some locks: Looks neater. Will adopt this when I send the patch. Thanks. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sat, Mar 24, 2007 at 10:43:48PM -0700, Paul Jackson wrote: I'm unsure here, but this 'tsk-cpuset == cs' test feels fragile to me. How about a bit earlier in attach_task(), right at the point we overwrite the victim tasks cpuset pointer, we decrement the count on the old cpuset, and if it went to zero, remember that we'll need to release it, once we've dropped some locks: Looks neater. Will adopt this when I send the patch. Thanks. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
vatsa wrote: > Not just this, continuing further we have more trouble: > > > CPU0 (attach_task T1 to CS2) CPU1 (T1 is exiting) > > > synchronize_rcu() > atomic_dec(>count); > [CS1->count = 0] > > if atomic_dec_and_test(>count)) > [CS1->count = -1] > ... > 2nd race is tricky. We probably need to do this to avoid it: > > task_lock(tsk); > > /* Check if tsk->cpuset is still same. We may have raced with >* cpuset_exit changing tsk->cpuset again under our feet. >*/ > if (tsk->cpuset == cs && atomic_dec_and_test(>count)) { I'm unsure here, but this 'tsk->cpuset == cs' test feels fragile to me. How about a bit earlier in attach_task(), right at the point we overwrite the victim tasks cpuset pointer, we decrement the count on the old cpuset, and if it went to zero, remember that we'll need to release it, once we've dropped some locks: static int attach_task(struct cpuset *cs, char *pidbuf, char **ppathbuf) { ... struct cpuset *oldcs; struct cpuset *oldcs_tobe_released; ... task_lock(tsk); oldcs = tsk->cpuset; ... if (tsk->flags & PF_EXITING) { ... } atomic_inc(>count); rcu_assign_pointer(tsk->cpuset, cs); oldcs_tobe_released = NULL; if (atomic_dec_and_test(>count)) oldcs_tobe_released = oldcs; task_unlock(tsk); ... put_task_struct(tsk); synchronize_rcu(); if (oldcs_tobe_released) check_for_release(oldcs_tobe_released, ppathbuf); return 0; } -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
> I will try to send out a patch later today to fix Thanks! > Agreed, but good to keep code clean isn't it? :) Definitely. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sat, Mar 24, 2007 at 09:45:50PM -0700, Paul Jackson wrote: > Nice work - thanks. Yes, both an extra cpuset count and a negative > cpuset count are bad news, opening the door to the usual catastrophes. > > Would you like the honor of submitting the patch to add a task_lock > to cpuset_exit()? If you do, be sure to fix, or at least remove, > the cpuset_exit comment lines: I will try to send out a patch later today to fix this bug in mainline cpuset code. I happened to notice this race with my rcfs patch and observed same is true with cpuset/container code also. > * We don't need to task_lock() this reference to tsk->cpuset, > * because tsk is already marked PF_EXITING, so attach_task() won't > * mess with it, or task is a failed fork, never visible to attach_task. Sure, I had seen that. > So, in real life, this would be a difficult race to trigger. Agreed, but good to keep code clean isn't it? :) > Thanks for finding this. Wellcome! -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
vatsa wrote: > Now consider: Nice work - thanks. Yes, both an extra cpuset count and a negative cpuset count are bad news, opening the door to the usual catastrophes. Would you like the honor of submitting the patch to add a task_lock to cpuset_exit()? If you do, be sure to fix, or at least remove, the cpuset_exit comment lines: * We don't need to task_lock() this reference to tsk->cpuset, * because tsk is already marked PF_EXITING, so attach_task() won't * mess with it, or task is a failed fork, never visible to attach_task. I guess that taking task_lock() in cpuset_exit() should not be a serious performance issue. It's taking a spinlock that is in the current exiting tasks task struct, so it should be a cache hot memory line and a rarely contested lock. And I guess I've not see this race in real life, as one side of it has to execute quite a bit of code in the task exit path, from when it sets PF_EXITING until it gets into the cpuset_exit() call, while the other side does the three lines: if (tsk->flags & PF_EXITING) ... atomic_inc(>count); rcu_assign_pointer(tsk->cpuset, cs); So, in real life, this would be a difficult race to trigger. Thanks for finding this. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sun, Mar 25, 2007 at 07:58:16AM +0530, Srivatsa Vaddagiri wrote: > Not just this, continuing further we have more trouble: > > > CPU0 (attach_task T1 to CS2) CPU1 (T1 is exiting) > > > synchronize_rcu() > atomic_dec(>count); > [CS1->count = 0] > > if atomic_dec_and_test(>count)) > [CS1->count = -1] > > > > We now have CS1->count negative. Is that good? I am uncomfortable .. > > We need a task_lock() in cpuset_exit to avoid this race. 2nd race is tricky. We probably need to do this to avoid it: task_lock(tsk); /* Check if tsk->cpuset is still same. We may have raced with * cpuset_exit changing tsk->cpuset again under our feet. */ if (tsk->cpuset == cs && atomic_dec_and_test(>count)) { task_unlock(tsk); check_for_release(oldcs, ppathbuf); goto done; } task_unlock(tsk); done: return 0; -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sat, Mar 24, 2007 at 06:41:28PM -0700, Paul Jackson wrote: > > the following code becomes racy with cpuset_exit() ... > > > > atomic_inc(>count); > > rcu_assign_pointer(tsk->cpuset, cs); > > task_unlock(tsk); > > eh ... so ... ? > > I don't know of any sequence where that causes any problem. > > Do you see one? Let's say we had two cpusets CS1 amd CS2 (both different from top_cpuset). CS1 has just one task T1 in it (CS1->count = 0) while CS2 has no tasks in it (CS2->count = 0). Now consider: CPU0 (attach_task T1 to CS2)CPU1 (T1 is exiting) task_lock(T1); oldcs = tsk->cpuset; [oldcs = CS1] T1->flags & PF_EXITING? (No) T1->flags = PF_EXITING; atomic_inc(>count); cpuset_exit() cs = tsk->cpuset; (cs = CS1) T1->cpuset = CS2; T1->cpuset = _cpuset; task_unlock(T1); CS2 has one bogus count now (with no tasks in it), which may prevent it from being removed/freed forever. Not just this, continuing further we have more trouble: CPU0 (attach_task T1 to CS2)CPU1 (T1 is exiting) synchronize_rcu() atomic_dec(>count); [CS1->count = 0] if atomic_dec_and_test(>count)) [CS1->count = -1] We now have CS1->count negative. Is that good? I am uncomfortable .. We need a task_lock() in cpuset_exit to avoid this race. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
vatsa wrote: > > if (tsk->flags & PF_EXITING) { > > What if PF_EXITING is set after this check? If that happens then, > > > task_unlock(tsk); > > mutex_unlock(_mutex); > > put_task_struct(tsk); > > return -ESRCH; > > } > > the following code becomes racy with cpuset_exit() ... > > atomic_inc(>count); > rcu_assign_pointer(tsk->cpuset, cs); > task_unlock(tsk); eh ... so ... ? I don't know of any sequence where that causes any problem. Do you see one? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sat, Mar 24, 2007 at 12:25:59PM -0700, Paul Jackson wrote: > > P.S : cpuset.c checks for PF_EXITING twice in attach_task(), while this > > patch seems to be checking only once. Is that fine? > > I think the cpuset code is ok, because, as you note, it locks the task, > picks off the cpuset pointer, and then checks a second time that the > task still does not have PF_EXITING set: Well afaics, PF_EXITING is set for the exiting task w/o taking any lock, which makes this racy always. > In the kernel/cpuset.c code for attach_task(): > > task_lock(tsk); > oldcs = tsk->cpuset; > /* > * After getting 'oldcs' cpuset ptr, be sure still not exiting. > * If 'oldcs' might be the top_cpuset due to the_top_cpuset_hack > * then fail this attach_task(), to avoid breaking top_cpuset.count. > */ > if (tsk->flags & PF_EXITING) { What if PF_EXITING is set after this check? If that happens then, > task_unlock(tsk); > mutex_unlock(_mutex); > put_task_struct(tsk); > return -ESRCH; > } the following code becomes racy with cpuset_exit() ... atomic_inc(>count); rcu_assign_pointer(tsk->cpuset, cs); task_unlock(tsk); -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
> IMO, we need to use task_lock() in container_exit() to avoid this race. > > (I think this race already exists in mainline cpuset.c?) > > P.S : cpuset.c checks for PF_EXITING twice in attach_task(), while this > patch seems to be checking only once. Is that fine? I think the cpuset code is ok, because, as you note, it locks the task, picks off the cpuset pointer, and then checks a second time that the task still does not have PF_EXITING set: In the kernel/cpuset.c code for attach_task(): task_lock(tsk); oldcs = tsk->cpuset; /* * After getting 'oldcs' cpuset ptr, be sure still not exiting. * If 'oldcs' might be the top_cpuset due to the_top_cpuset_hack * then fail this attach_task(), to avoid breaking top_cpuset.count. */ if (tsk->flags & PF_EXITING) { task_unlock(tsk); mutex_unlock(_mutex); put_task_struct(tsk); return -ESRCH; } -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Mon, Feb 12, 2007 at 12:15:22AM -0800, [EMAIL PROTECTED] wrote: > +static int attach_task(struct container *cont, char *pidbuf, char **ppathbuf) > +{ > + pid_t pid; > + struct task_struct *tsk; > + struct container *oldcont; > + int retval; > + > + if (sscanf(pidbuf, "%d", ) != 1) > + return -EIO; > + > + if (pid) { > + read_lock(_lock); > + > + tsk = find_task_by_pid(pid); > + if (!tsk || tsk->flags & PF_EXITING) { This is probably carrying over code from cpuset.c, but : /me thinks that there is a ugly race here with 'tsk' exiting. What happens if the tsk is marked PF_EXITING just after this check? If that happens, then: > + read_unlock(_lock); > + return -ESRCH; > + } > + > + get_task_struct(tsk); > + read_unlock(_lock); > + > + if ((current->euid) && (current->euid != tsk->uid) > + && (current->euid != tsk->suid)) { > + put_task_struct(tsk); > + return -EACCES; > + } > + } else { > + tsk = current; > + get_task_struct(tsk); > + } > + > + retval = security_task_setscheduler(tsk, 0, NULL); > + if (retval) { > + put_task_struct(tsk); > + return retval; > + } > + > + mutex_lock(_mutex); > + > + task_lock(tsk); > + oldcont = tsk->container; > + if (!oldcont) { > + task_unlock(tsk); > + mutex_unlock(_mutex); > + put_task_struct(tsk); > + return -ESRCH; > + } > + atomic_inc(>count); > + rcu_assign_pointer(tsk->container, cont); Above assignment A1 can race with below assignment A2 in container_exit() : tsk->container = _container; /* the_top_container_hack - see above */ What happens if A1 follows after A2? I feel very uncomfortable abt it. IMO, we need to use task_lock() in container_exit() to avoid this race. (I think this race already exists in mainline cpuset.c?) P.S : cpuset.c checks for PF_EXITING twice in attach_task(), while this patch seems to be checking only once. Is that fine? -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Mon, Feb 12, 2007 at 12:15:22AM -0800, [EMAIL PROTECTED] wrote: +static int attach_task(struct container *cont, char *pidbuf, char **ppathbuf) +{ + pid_t pid; + struct task_struct *tsk; + struct container *oldcont; + int retval; + + if (sscanf(pidbuf, %d, pid) != 1) + return -EIO; + + if (pid) { + read_lock(tasklist_lock); + + tsk = find_task_by_pid(pid); + if (!tsk || tsk-flags PF_EXITING) { This is probably carrying over code from cpuset.c, but : /me thinks that there is a ugly race here with 'tsk' exiting. What happens if the tsk is marked PF_EXITING just after this check? If that happens, then: + read_unlock(tasklist_lock); + return -ESRCH; + } + + get_task_struct(tsk); + read_unlock(tasklist_lock); + + if ((current-euid) (current-euid != tsk-uid) + (current-euid != tsk-suid)) { + put_task_struct(tsk); + return -EACCES; + } + } else { + tsk = current; + get_task_struct(tsk); + } + + retval = security_task_setscheduler(tsk, 0, NULL); + if (retval) { + put_task_struct(tsk); + return retval; + } + + mutex_lock(callback_mutex); + + task_lock(tsk); + oldcont = tsk-container; + if (!oldcont) { + task_unlock(tsk); + mutex_unlock(callback_mutex); + put_task_struct(tsk); + return -ESRCH; + } + atomic_inc(cont-count); + rcu_assign_pointer(tsk-container, cont); Above assignment A1 can race with below assignment A2 in container_exit() : tsk-container = top_container; /* the_top_container_hack - see above */ What happens if A1 follows after A2? I feel very uncomfortable abt it. IMO, we need to use task_lock() in container_exit() to avoid this race. (I think this race already exists in mainline cpuset.c?) P.S : cpuset.c checks for PF_EXITING twice in attach_task(), while this patch seems to be checking only once. Is that fine? -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
IMO, we need to use task_lock() in container_exit() to avoid this race. (I think this race already exists in mainline cpuset.c?) P.S : cpuset.c checks for PF_EXITING twice in attach_task(), while this patch seems to be checking only once. Is that fine? I think the cpuset code is ok, because, as you note, it locks the task, picks off the cpuset pointer, and then checks a second time that the task still does not have PF_EXITING set: In the kernel/cpuset.c code for attach_task(): task_lock(tsk); oldcs = tsk-cpuset; /* * After getting 'oldcs' cpuset ptr, be sure still not exiting. * If 'oldcs' might be the top_cpuset due to the_top_cpuset_hack * then fail this attach_task(), to avoid breaking top_cpuset.count. */ if (tsk-flags PF_EXITING) { task_unlock(tsk); mutex_unlock(callback_mutex); put_task_struct(tsk); return -ESRCH; } -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sat, Mar 24, 2007 at 12:25:59PM -0700, Paul Jackson wrote: P.S : cpuset.c checks for PF_EXITING twice in attach_task(), while this patch seems to be checking only once. Is that fine? I think the cpuset code is ok, because, as you note, it locks the task, picks off the cpuset pointer, and then checks a second time that the task still does not have PF_EXITING set: Well afaics, PF_EXITING is set for the exiting task w/o taking any lock, which makes this racy always. In the kernel/cpuset.c code for attach_task(): task_lock(tsk); oldcs = tsk-cpuset; /* * After getting 'oldcs' cpuset ptr, be sure still not exiting. * If 'oldcs' might be the top_cpuset due to the_top_cpuset_hack * then fail this attach_task(), to avoid breaking top_cpuset.count. */ if (tsk-flags PF_EXITING) { What if PF_EXITING is set after this check? If that happens then, task_unlock(tsk); mutex_unlock(callback_mutex); put_task_struct(tsk); return -ESRCH; } the following code becomes racy with cpuset_exit() ... atomic_inc(cs-count); rcu_assign_pointer(tsk-cpuset, cs); task_unlock(tsk); -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
vatsa wrote: if (tsk-flags PF_EXITING) { What if PF_EXITING is set after this check? If that happens then, task_unlock(tsk); mutex_unlock(callback_mutex); put_task_struct(tsk); return -ESRCH; } the following code becomes racy with cpuset_exit() ... atomic_inc(cs-count); rcu_assign_pointer(tsk-cpuset, cs); task_unlock(tsk); eh ... so ... ? I don't know of any sequence where that causes any problem. Do you see one? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sat, Mar 24, 2007 at 06:41:28PM -0700, Paul Jackson wrote: the following code becomes racy with cpuset_exit() ... atomic_inc(cs-count); rcu_assign_pointer(tsk-cpuset, cs); task_unlock(tsk); eh ... so ... ? I don't know of any sequence where that causes any problem. Do you see one? Let's say we had two cpusets CS1 amd CS2 (both different from top_cpuset). CS1 has just one task T1 in it (CS1-count = 0) while CS2 has no tasks in it (CS2-count = 0). Now consider: CPU0 (attach_task T1 to CS2)CPU1 (T1 is exiting) task_lock(T1); oldcs = tsk-cpuset; [oldcs = CS1] T1-flags PF_EXITING? (No) T1-flags = PF_EXITING; atomic_inc(CS2-count); cpuset_exit() cs = tsk-cpuset; (cs = CS1) T1-cpuset = CS2; T1-cpuset = top_cpuset; task_unlock(T1); CS2 has one bogus count now (with no tasks in it), which may prevent it from being removed/freed forever. Not just this, continuing further we have more trouble: CPU0 (attach_task T1 to CS2)CPU1 (T1 is exiting) synchronize_rcu() atomic_dec(CS1-count); [CS1-count = 0] if atomic_dec_and_test(oldcs-count)) [CS1-count = -1] We now have CS1-count negative. Is that good? I am uncomfortable .. We need a task_lock() in cpuset_exit to avoid this race. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sun, Mar 25, 2007 at 07:58:16AM +0530, Srivatsa Vaddagiri wrote: Not just this, continuing further we have more trouble: CPU0 (attach_task T1 to CS2) CPU1 (T1 is exiting) synchronize_rcu() atomic_dec(CS1-count); [CS1-count = 0] if atomic_dec_and_test(oldcs-count)) [CS1-count = -1] We now have CS1-count negative. Is that good? I am uncomfortable .. We need a task_lock() in cpuset_exit to avoid this race. 2nd race is tricky. We probably need to do this to avoid it: task_lock(tsk); /* Check if tsk-cpuset is still same. We may have raced with * cpuset_exit changing tsk-cpuset again under our feet. */ if (tsk-cpuset == cs atomic_dec_and_test(oldcs-count)) { task_unlock(tsk); check_for_release(oldcs, ppathbuf); goto done; } task_unlock(tsk); done: return 0; -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
vatsa wrote: Now consider: Nice work - thanks. Yes, both an extra cpuset count and a negative cpuset count are bad news, opening the door to the usual catastrophes. Would you like the honor of submitting the patch to add a task_lock to cpuset_exit()? If you do, be sure to fix, or at least remove, the cpuset_exit comment lines: * We don't need to task_lock() this reference to tsk-cpuset, * because tsk is already marked PF_EXITING, so attach_task() won't * mess with it, or task is a failed fork, never visible to attach_task. I guess that taking task_lock() in cpuset_exit() should not be a serious performance issue. It's taking a spinlock that is in the current exiting tasks task struct, so it should be a cache hot memory line and a rarely contested lock. And I guess I've not see this race in real life, as one side of it has to execute quite a bit of code in the task exit path, from when it sets PF_EXITING until it gets into the cpuset_exit() call, while the other side does the three lines: if (tsk-flags PF_EXITING) ... atomic_inc(cs-count); rcu_assign_pointer(tsk-cpuset, cs); So, in real life, this would be a difficult race to trigger. Thanks for finding this. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sat, Mar 24, 2007 at 09:45:50PM -0700, Paul Jackson wrote: Nice work - thanks. Yes, both an extra cpuset count and a negative cpuset count are bad news, opening the door to the usual catastrophes. Would you like the honor of submitting the patch to add a task_lock to cpuset_exit()? If you do, be sure to fix, or at least remove, the cpuset_exit comment lines: I will try to send out a patch later today to fix this bug in mainline cpuset code. I happened to notice this race with my rcfs patch and observed same is true with cpuset/container code also. * We don't need to task_lock() this reference to tsk-cpuset, * because tsk is already marked PF_EXITING, so attach_task() won't * mess with it, or task is a failed fork, never visible to attach_task. Sure, I had seen that. So, in real life, this would be a difficult race to trigger. Agreed, but good to keep code clean isn't it? :) Thanks for finding this. Wellcome! -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
I will try to send out a patch later today to fix Thanks! Agreed, but good to keep code clean isn't it? :) Definitely. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
vatsa wrote: Not just this, continuing further we have more trouble: CPU0 (attach_task T1 to CS2) CPU1 (T1 is exiting) synchronize_rcu() atomic_dec(CS1-count); [CS1-count = 0] if atomic_dec_and_test(oldcs-count)) [CS1-count = -1] ... 2nd race is tricky. We probably need to do this to avoid it: task_lock(tsk); /* Check if tsk-cpuset is still same. We may have raced with * cpuset_exit changing tsk-cpuset again under our feet. */ if (tsk-cpuset == cs atomic_dec_and_test(oldcs-count)) { I'm unsure here, but this 'tsk-cpuset == cs' test feels fragile to me. How about a bit earlier in attach_task(), right at the point we overwrite the victim tasks cpuset pointer, we decrement the count on the old cpuset, and if it went to zero, remember that we'll need to release it, once we've dropped some locks: static int attach_task(struct cpuset *cs, char *pidbuf, char **ppathbuf) { ... struct cpuset *oldcs; struct cpuset *oldcs_tobe_released; ... task_lock(tsk); oldcs = tsk-cpuset; ... if (tsk-flags PF_EXITING) { ... } atomic_inc(cs-count); rcu_assign_pointer(tsk-cpuset, cs); oldcs_tobe_released = NULL; if (atomic_dec_and_test(oldcs-count)) oldcs_tobe_released = oldcs; task_unlock(tsk); ... put_task_struct(tsk); synchronize_rcu(); if (oldcs_tobe_released) check_for_release(oldcs_tobe_released, ppathbuf); return 0; } -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Thu, Mar 22, 2007 at 03:26:51PM +0530, Srivatsa Vaddagiri wrote: > On Mon, Feb 12, 2007 at 12:15:22AM -0800, [EMAIL PROTECTED] wrote: > > +static void remove_dir(struct dentry *d) > > +{ > > + struct dentry *parent = dget(d->d_parent); > > Don't we need to lock parent inode's mutex here? sysfs seems to take > that lock. Never mind. Maneesh pointed me to do_rmdir() in VFS which takes that mutex as well. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Mon, Feb 12, 2007 at 12:15:22AM -0800, [EMAIL PROTECTED] wrote: > +static void remove_dir(struct dentry *d) > +{ > + struct dentry *parent = dget(d->d_parent); Don't we need to lock parent inode's mutex here? sysfs seems to take that lock. > + > + d_delete(d); > + simple_rmdir(parent->d_inode, d); > + dput(parent); > +} -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Mon, Feb 12, 2007 at 12:15:22AM -0800, [EMAIL PROTECTED] wrote: +static void remove_dir(struct dentry *d) +{ + struct dentry *parent = dget(d-d_parent); Don't we need to lock parent inode's mutex here? sysfs seems to take that lock. + + d_delete(d); + simple_rmdir(parent-d_inode, d); + dput(parent); +} -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Thu, Mar 22, 2007 at 03:26:51PM +0530, Srivatsa Vaddagiri wrote: On Mon, Feb 12, 2007 at 12:15:22AM -0800, [EMAIL PROTECTED] wrote: +static void remove_dir(struct dentry *d) +{ + struct dentry *parent = dget(d-d_parent); Don't we need to lock parent inode's mutex here? sysfs seems to take that lock. Never mind. Maneesh pointed me to do_rmdir() in VFS which takes that mutex as well. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sun, Mar 11, 2007 at 12:38:43PM -0700, Paul Jackson wrote: > The primary reason for the cpuset double locking, as I recall, was because > cpusets needs to access cpusets inside the memory allocator. "needs to access cpusets" - can you be more specific? Being able to safely walk cpuset->parent list - is it the only access you are talking of or more? -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Sun, Mar 11, 2007 at 12:38:43PM -0700, Paul Jackson wrote: The primary reason for the cpuset double locking, as I recall, was because cpusets needs to access cpusets inside the memory allocator. needs to access cpusets - can you be more specific? Being able to safely walk cpuset-parent list - is it the only access you are talking of or more? -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
vatsa wrote: > Yes, that way only the hierarchy hosting cpusets takes the hit of > double-locking. cpuset_subsys->create/destroy can take this additional lock > inside cpuset.c. The primary reason for the cpuset double locking, as I recall, was because cpusets needs to access cpusets inside the memory allocator. A single, straight forward, cpuset lock failed under the following common scenario: 1) user does cpuset system call (writes some file below /dev/cpuset, e.g.) 2) kernel cpuset code locks its lock 3) cpuset code asks to allocate some memory for some cpuset structure 4) memory allocator tries to lock the cpuset lock - deadlock! The reason that the memory allocator needs the cpuset lock is to check whether the memory nodes the current task is allowed to use have changed, due to changes in the current tasks cpuset. A secondary reason that the cpuset code needs two locks is because the main cpuset lock is a long held, system wide lock, and various low level bits of performance critical code sometimes require quick, read-only access to cpusets. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
vatsa wrote: Yes, that way only the hierarchy hosting cpusets takes the hit of double-locking. cpuset_subsys-create/destroy can take this additional lock inside cpuset.c. The primary reason for the cpuset double locking, as I recall, was because cpusets needs to access cpusets inside the memory allocator. A single, straight forward, cpuset lock failed under the following common scenario: 1) user does cpuset system call (writes some file below /dev/cpuset, e.g.) 2) kernel cpuset code locks its lock 3) cpuset code asks to allocate some memory for some cpuset structure 4) memory allocator tries to lock the cpuset lock - deadlock! The reason that the memory allocator needs the cpuset lock is to check whether the memory nodes the current task is allowed to use have changed, due to changes in the current tasks cpuset. A secondary reason that the cpuset code needs two locks is because the main cpuset lock is a long held, system wide lock, and various low level bits of performance critical code sometimes require quick, read-only access to cpusets. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On 3/8/07, Srivatsa Vaddagiri <[EMAIL PROTECTED]> wrote: On Wed, Mar 07, 2007 at 12:50:03PM -0800, Paul Menage wrote: > The callback mutex (which is what container_lock() actually locks) is > also used to synchronize fork/exit against subsystem additions, in the > event that some subsystem has registered fork or exit callbacks. We > could probably have a separate subsystem_mutex for that instead. Why can't manage_mutex itself be used there (to serialize fork/exit callbacks against modification to hierarchy)? Because manage_mutex can be held for very long periods of time. I think that a combination of a new lock that's only taken by fork/exit and register_subsys, plus task_lock (which prevents the current task from being moved) would be more lightweight. Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Wed, Mar 07, 2007 at 12:50:03PM -0800, Paul Menage wrote: > The callback mutex (which is what container_lock() actually locks) is > also used to synchronize fork/exit against subsystem additions, in the > event that some subsystem has registered fork or exit callbacks. We > could probably have a separate subsystem_mutex for that instead. Why can't manage_mutex itself be used there (to serialize fork/exit callbacks against modification to hierarchy)? > Apart from that, yes, it may well be possible to move callback lock > entirely inside cpusets. Yes, that way only the hierarchy hosting cpusets takes the hit of double-locking. cpuset_subsys->create/destroy can take this additional lock inside cpuset.c. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Wed, Mar 07, 2007 at 12:50:03PM -0800, Paul Menage wrote: The callback mutex (which is what container_lock() actually locks) is also used to synchronize fork/exit against subsystem additions, in the event that some subsystem has registered fork or exit callbacks. We could probably have a separate subsystem_mutex for that instead. Why can't manage_mutex itself be used there (to serialize fork/exit callbacks against modification to hierarchy)? Apart from that, yes, it may well be possible to move callback lock entirely inside cpusets. Yes, that way only the hierarchy hosting cpusets takes the hit of double-locking. cpuset_subsys-create/destroy can take this additional lock inside cpuset.c. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On 3/8/07, Srivatsa Vaddagiri [EMAIL PROTECTED] wrote: On Wed, Mar 07, 2007 at 12:50:03PM -0800, Paul Menage wrote: The callback mutex (which is what container_lock() actually locks) is also used to synchronize fork/exit against subsystem additions, in the event that some subsystem has registered fork or exit callbacks. We could probably have a separate subsystem_mutex for that instead. Why can't manage_mutex itself be used there (to serialize fork/exit callbacks against modification to hierarchy)? Because manage_mutex can be held for very long periods of time. I think that a combination of a new lock that's only taken by fork/exit and register_subsys, plus task_lock (which prevents the current task from being moved) would be more lightweight. Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Wed, Mar 07, 2007 at 05:51:17PM +0530, Srivatsa Vaddagiri wrote: > If that is the case, I think we can push container_lock entirely inside > cpuset.c and not have others exposed to this double-lock complexity. > This is possible because cpuset.c (build on top of containers) still has > cpuset->parent and walking cpuset->parent list safely can be made > possible with a second lock which is local to only cpuset.c. Hope I am not shooting in the dark here! If we can indeed avoid container_lock in generic code, it will simplify code like attach_task (for ex: post_attach/attach can be clubbed into one single callback). -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Wed, Mar 07, 2007 at 05:51:17PM +0530, Srivatsa Vaddagiri wrote: If that is the case, I think we can push container_lock entirely inside cpuset.c and not have others exposed to this double-lock complexity. This is possible because cpuset.c (build on top of containers) still has cpuset-parent and walking cpuset-parent list safely can be made possible with a second lock which is local to only cpuset.c. Hope I am not shooting in the dark here! If we can indeed avoid container_lock in generic code, it will simplify code like attach_task (for ex: post_attach/attach can be clubbed into one single callback). -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Tue, Feb 13, 2007 at 11:18:57AM +0530, Srivatsa Vaddagiri wrote: > Which make me wonder why we need task_lock() at all ..I can understand > the need for a lock like that if we are reading/updating multiple words > in task_struct under the lock. In this case, it is used to read/write > just one pointer, isnt it? I think it can be eliminated all-together > with the use of RCU. I see that cpuset.c uses task_lock to read/write multiple words (cpuset_update_task_memory_state) ..So yes it is necessary in attach_task() .. -- Regards, vatsa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ckrm-tech] [PATCH 1/7] containers (V7): Generic container system abstracted from cpusets code
On Tue, Feb 13, 2007 at 11:18:57AM +0530, Srivatsa Vaddagiri wrote: Which make me wonder why we need task_lock() at all ..I can understand the need for a lock like that if we are reading/updating multiple words in task_struct under the lock. In this case, it is used to read/write just one pointer, isnt it? I think it can be eliminated all-together with the use of RCU. I see that cpuset.c uses task_lock to read/write multiple words (cpuset_update_task_memory_state) ..So yes it is necessary in attach_task() .. -- Regards, vatsa - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/