Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Fri, Aug 19, 2005 at 05:27:24PM +0400, Oleg Nesterov wrote: > Paul E. McKenney wrote: > > > > I have indeed been thinking along these lines, but all of the devil plans > > that I have come up thus far are quite intrusive, and might have some > > performance problems in some situations. So it seems best to remove > > tasklist_lock in steps: > > > > 1. Single-recipient catch and ignore cases. > > > > 2. Single-recipient stop/continue cases. > > > > 3. Single-recipient fatal cases. > > > > 4. Single-process multi-threaded stop/continue cases. > > > > 5. Single-process multi-threaded fatal cases. > > > > 6. And on to process-group cases. > > Paul, I am not an expert at all, but honestly I don't see how > this could be achieved. This lock is heavily overloaded for > quite different purposes. I think that may be it makes sense > to try other steps, for example (random order): > > 1. Tasklist protects ->sighand changing (de_thread) - rework > sighand access/locking. We have a start on this, and this is the first area I am making tests for. > 2. Tasklist protects reparenting - fix this. Good point... Will require thought. And tests that target this race. The tests are what I am working now. > ... > > N. PTRACE!!! Well, I close my eyes immediately when I see this > word in the sources. This one as well. Hmmm The tests could be fun -- race gdb grabbing a process with sending that process a signal. Might need to instead make a small program that pretends to be gdb, will play with it. Some others that come to mind: N+1.Process-group leader dies while process-group signal is being received. N+2.Process-group members are wildly creating new processes while a fatal signal is being received (which should result in -all- processes in the group dying). N+3.Threads are being wildly created in a multithreaded process while a fatal signal is being received... N+4.The tsk->signal->curr_target in a multithreaded process changes while a signal is being received. N+5.The handler state (default/ignored/caught) changes while a signal is being received. But siglock should cover this. N+6.Different signal reception orders for different tasks in a given set of groups (e.g., fatal signal to process group at same time as fatal signal to multithreaded process within that group). Don't believe that this one matters, but thought I should call it out anyway. o SIGHUP/SIGCONT from tty going away vs. SIGSTOP from elsewhere. o tty signal (ctrl-Z, ctrl-C, ...) vs. signal to specific process in the tty's process group. And there are probably more, but this should keep me busy for a bit. > Only then we can eliminate tasklist locking from signal sending > path. But I don't see the easy way to solve any of these 1 - N > problems. If it was easy, someone would likely already have done it... > Currently I don't see how your patch could be "fixed" for SIGCONT > case, except very ugly: > > kill_proc_info(sig) > { > p = find_task_by_pid(pid); > if (sig == SIGCONT) > read_lock(&tasklist_lock); > > error = group_send_sig_info(...); > ... > } I was indeed thinking along these lines. > But there are other problems too. > > Look at __group_complete_signal(), it scans p->pids[PIDTYPE_TGID].pid_list > list to find a a suitable thread. What if 'p' does clone(CLONE_THREAD) now? > Let's look at copy_process(), it does attach_pid(p, PIDTYPE_TGID, p->tgid) > under the lovely tasklist_lock again. Another approach that I am considering for the class of problem is to have the code paths that create or change tasks momentarily acquire locks corresponding to the groups of tasks that can receive signals. There are a number of problems with this, for example, the logical place for the lock corresponding to a process group is the process-group leader. There has to be a way to handling the case where the process-group leader dies before the rest of the processes do. And so on, for similar changes. > So, I don't beleive we can solve even the simplest case (single-recipient, > non fatal, non stop/cont) without significant locking rework. Agreed, hence the "Not-signed-off-by:" on my original patch. > I hope that your patch will stimulate this work. Me too, will continue working it. Your comments have been very helpful! If nothing else, this thread has shown that there are some worthwhile benefits from getting rid of tasklist_lock... Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
Paul E. McKenney wrote: > > I have indeed been thinking along these lines, but all of the devil plans > that I have come up thus far are quite intrusive, and might have some > performance problems in some situations. So it seems best to remove > tasklist_lock in steps: > > 1.Single-recipient catch and ignore cases. > > 2.Single-recipient stop/continue cases. > > 3.Single-recipient fatal cases. > > 4.Single-process multi-threaded stop/continue cases. > > 5.Single-process multi-threaded fatal cases. > > 6.And on to process-group cases. Paul, I am not an expert at all, but honestly I don't see how this could be achieved. This lock is heavily overloaded for quite different purposes. I think that may be it makes sense to try other steps, for example (random order): 1. Tasklist protects ->sighand changing (de_thread) - rework sighand access/locking. 2. Tasklist protects reparenting - fix this. ... N. PTRACE!!! Well, I close my eyes immediately when I see this word in the sources. Only then we can eliminate tasklist locking from signal sending path. But I don't see the easy way to solve any of these 1 - N problems. Currently I don't see how your patch could be "fixed" for SIGCONT case, except very ugly: kill_proc_info(sig) { p = find_task_by_pid(pid); if (sig == SIGCONT) read_lock(&tasklist_lock); error = group_send_sig_info(...); ... } But there are other problems too. Look at __group_complete_signal(), it scans p->pids[PIDTYPE_TGID].pid_list list to find a a suitable thread. What if 'p' does clone(CLONE_THREAD) now? Let's look at copy_process(), it does attach_pid(p, PIDTYPE_TGID, p->tgid) under the lovely tasklist_lock again. So, I don't beleive we can solve even the simplest case (single-recipient, non fatal, non stop/cont) without significant locking rework. I hope that your patch will stimulate this work. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Thu, Aug 18, 2005 at 03:48:00PM +0400, Oleg Nesterov wrote: > Paul E. McKenney wrote: > > > > On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote: > > > > > > Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND, > > > so we always need to lock one ->sighand. Could you please clarify? > > > > On the #3 and #4 code paths, the code assumes that the task-list lock > > is held. So I was thinking of something (very roughly) as follows: > > > > #define SIGLOCK_HOLD_RCU (1 << 0) > > #define SIGLOCK_HOLD_TASKLIST (1 << 1) > > #define SIGLOCK_HOLD_SIGLOCK (1 << 2) > > Oh, no, sorry for confusion. > > I meant this function should only lock ->sighand, nothing more, something > like this: > > // must be called with preemtion disabled !!! > struct sighand_struct *lock_task_sighand(struct task_struct *tsk, unsigned > long *flags) > { > struct sighand_struct *sighand; > > //sighand = NULL; > //if (!get_task_struct_rcu(tsk)) > //goto out; > > for (;;) { > sighand = tsk->sighand; > if (unlikely(sighand == NULL)) > break; > > spin_lock_irqsave(sighand->siglock, *flags); > > if (likely(sighand == tsk->sighand) > goto out; > > spin_unlock_irqrestore(sighand->siglock, *flags); > } > > //put_task_struct(tsk); > out: > return sighand; > } > > static inline void unlock_task_sighand(struct task_struct *tsk, unsigned long > *flags) > { > spin_unlock_irqrestore(tsk->sighand->siglock, flags); > //put_task_struct(tsk); > } > > int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p) > { > unsigned long flags; > int ret; > > ret = check_kill_permission(sig, info, p); > if (!ret && sig) { > ret = -ESRCH; > if (lock_task_sighand(p, &flags)) { > ret = __group_send_sig_info(sig, info, p); > unlock_task_sighand(p, &flags); > } > } > > return ret; > } > > Currently the only user of it will be group_send_sig_info(), Well, that is one reason that I didn't think that you were talking about lock_task_sighand() working only on siglock... > but I hope > you have devil plans to kill the "tasklist_lock guards the very rare > ->sighand change" finally :) ;-) I have indeed been thinking along these lines, but all of the devil plans that I have come up thus far are quite intrusive, and might have some performance problems in some situations. So it seems best to remove tasklist_lock in steps: 1. Single-recipient catch and ignore cases. 2. Single-recipient stop/continue cases. 3. Single-recipient fatal cases. 4. Single-process multi-threaded stop/continue cases. 5. Single-process multi-threaded fatal cases. 6. And on to process-group cases. There are a number of ways of handling this, and they all have some scary aspects (and here you thought the current patches were scary!!!): 1. Provide a single signal-reception data structure for each multi-recipient case: process for multithreaded processes, process-group leader for process-group signals, controlling terminal for tty-based signals. The task delivering the signal records the signal (perhaps queuing it) on the signal-reception data structure, and sends an IPI to each CPU currently running one of the affected tasks. What about the tasks not currently running? There are a number of devil-planlet alternatives possible here: a. Have a daemon responsible for waking up any signaled tasks (assuming that they are blocked or sleeping interruptibly). b. Have the tasks running on the CPUs each wake up one sleeping task, who in turn wake up another once they receive the signal, and so on. c. Have tasks check their signal-reception data structures (might have up to three: process, process group, controlling terminal). Hopefully can get by with a very few such checks, but... What to do if several signals arrive at about the same time? What about consistent ordering of signals to different groups with partially overlapping task membership? Are we allowed to see different signal orderings for different tasks sharing two different groupings? Devil-planlet alternatives: a. Serialize on controlling terminal (but this might as well be a global lock on single-user systems). b. Allow signals to arrive in different orders at different threads of a process, if the different signals are targetted at different groups (e.g., proc
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
(Replying to wrong message, sorry). Thomas Gleixner wrote: > > --- linux-2.6.13-rc6-rt8/kernel/fork.c2005-08-17 12:57:08.0 > +0200 > +++ linux-2.6.13-rc6-rt/kernel/fork.c 2005-08-17 11:17:46.0 +0200 > @@ -1198,7 +1198,8 @@ bad_fork_cleanup_mm: > bad_fork_cleanup_signal: > exit_signal(p); > bad_fork_cleanup_sighand: > - exit_sighand(p); > + if (p->sighand) /* exit_signal() could have freed p->sighand */ > + exit_sighand(p); Looks like now it is the only user of exit_signal(), and I think we can kill this function and just do: bad_fork_cleanup_sighand: if (p->sighand) { // p->sighand can't change here, we don't need tasklist lock if (atomic_dec_and_test(p->sighand->count)) // If we get here we are not sharing ->sighand with anybody else. // It means, in particular, that p had no CLONE_THREAD flag. // Nobody can see this process yet, we didn't call attach_pid(), // otherwise ->sighand was freed from __exit_signal. Thus nobody // can see this sighand. sighand_free(p->sighand); } It is not an optimization, this path is rare, just to make things more clear and to reduce "false positives" from grep tasklist_lock. And I think it makes sense to #define put_sighand(sig)\ do if atomic_dec_and_test(sig->count) sighand_free(sig); while (0) Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
Paul E. McKenney wrote: > > On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote: > > > > Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND, > > so we always need to lock one ->sighand. Could you please clarify? > > On the #3 and #4 code paths, the code assumes that the task-list lock > is held. So I was thinking of something (very roughly) as follows: > > #define SIGLOCK_HOLD_RCU (1 << 0) > #define SIGLOCK_HOLD_TASKLIST (1 << 1) > #define SIGLOCK_HOLD_SIGLOCK (1 << 2) Oh, no, sorry for confusion. I meant this function should only lock ->sighand, nothing more, something like this: // must be called with preemtion disabled !!! struct sighand_struct *lock_task_sighand(struct task_struct *tsk, unsigned long *flags) { struct sighand_struct *sighand; // sighand = NULL; // if (!get_task_struct_rcu(tsk)) // goto out; for (;;) { sighand = tsk->sighand; if (unlikely(sighand == NULL)) break; spin_lock_irqsave(sighand->siglock, *flags); if (likely(sighand == tsk->sighand) goto out; spin_unlock_irqrestore(sighand->siglock, *flags); } // put_task_struct(tsk); out: return sighand; } static inline void unlock_task_sighand(struct task_struct *tsk, unsigned long *flags) { spin_unlock_irqrestore(tsk->sighand->siglock, flags); // put_task_struct(tsk); } int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p) { unsigned long flags; int ret; ret = check_kill_permission(sig, info, p); if (!ret && sig) { ret = -ESRCH; if (lock_task_sighand(p, &flags)) { ret = __group_send_sig_info(sig, info, p); unlock_task_sighand(p, &flags); } } return ret; } Currently the only user of it will be group_send_sig_info(), but I hope you have devil plans to kill the "tasklist_lock guards the very rare ->sighand change" finally :) Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote: > Paul E. McKenney wrote: > > > > > The other thing that jumped out at me is that signals are very different > > > animals from a locking viewpoint depending on whether they are: > > > > > > 1.ignored, > > > > > > 2.caught by a single thread, > > > > > > 3.fatal to multiple threads/processes (though I don't know > > > of anything that shares sighand_struct between separate > > > processes), or > > > > > > 4.otherwise global to multiple threads/processes (such as > > > SIGSTOP and SIGCONT). > > > > > > And there are probably other distinctions that I have not yet caught > > > on to. > > > > > > One way to approach this would be to make your suggested > > > lock_task_sighand() > > > look at the signal and acquire the appropriate locks. If, having acquired > > > a given set of locks, it found that the needed set had changed (e.g., due > > > to racing exec() or sigaction()), then it drops the locks and retries. > > > > OK, for this sort of approach to work, lock_task_sighand() must take and > > return some sort of mask indicating what locks are held. The mask returned > > by lock_task_sighand() would then be passed to an unlock_task_sighand(). > > Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND, > so we always need to lock one ->sighand. Could you please clarify? On the #3 and #4 code paths, the code assumes that the task-list lock is held. So I was thinking of something (very roughly) as follows: #define SIGLOCK_HOLD_RCU (1 << 0) #define SIGLOCK_HOLD_TASKLIST (1 << 1) #define SIGLOCK_HOLD_SIGLOCK (1 << 2) int lock_task_sighand(int sig, struct task_struct *p, int locksheld, struct sighand_struct **spp, int *flags) { int locksret = 0; struct sighand_struct *sp; retry: if (!(locksheld & SIGLOCK_HOLD_RCU)) { locksret |= SIGLOCK_HOLD_RCU; rcu_read_lock(); } sp = rcu_dereference(p->sighand); if (sp == NULL) { unlock_task_sighand(NULL, locksret, *flags); *spp = NULL; return 0; } if (!(locksheld & SIGLOCK_HOLD_TASKLIST)) { /* Complain if siglock held. */ if (sig_kernel_stop(sig) /* expand for other conditions */) { locksret |= SIGLOCK_HOLD_TASKLIST; read_lock(&tasklist_lock); } } if (!(locksheld & SIGLOCK_HOLD_SIGLOCK)) { *flags = 0; } else { locksret |= SIGLOCK_HOLD_SIGLOCK; spin_lock_irqsave(&sp->siglock, *flags); if (p->sighand != sp) { unlock_task_sighand(sp, locksret, *flags); goto retry; } } *spp = sp; return locksret; } void unlock_task_sighand(struct sighand_struct *sp, int lockstodrop, int flags) { /* lockstodrop had better be non-negative! */ if (lockstodrop & SIGLOCK_HOLD_SIGLOCK) { spin_unlock_irqrestore(&sp->siglock, flags); } if (lockstodrop & SIGLOCK_HOLD_TASKLIST) { read_unlock(&tasklist_lock); } if (lockstodrop & SIGLOCK_HOLD_RCU) { rcu_read_unlock(); } } The "expand for other conditions" must also cover signals that cause coredumps, that kill all threads in a process, or that otherwise affect more than one thread (e.g., kill with a negative signal number). I am assuming that your argument about not needing the get_task_struct_rcu() eventually work their way through my skull. ;-) Of course, the "expand for other conditions" requires reference to the sighand_struct. And for that, you really need to be holding siglock. But you have to drop siglock to acquire tasklist_lock. So will end up relying on RCU some more to safely peek into sighand_struct -before- acquiring siglock -- which is why I snapshot p->sighand so early, it will need to be referenced when checking "other conditions". I am not exactly happy with the above pair of functions, but I suspect that they might -really- simplify things a bit. The usage would be as follows: if ((lockstodrop = lock_task_sighand(sig, tsk, 0, &sp, &flags)) < 0) return lockstodrop; /* error code */ if (sp != NULL) { /* invoke the function to send the signal */ } unlock_task_sighand(sp, lockstodrop, flags); Thoughts? Hey, you asked for this!!! ;-) > > > > + if (!ret && sig && (sp = p->sighand)) { > > > > if (!get_task_struct_rcu(p)) { > > > > return -ESRCH; > > > > } > > > > - spin_lock_irqsave(&p->sighand->siglock, flags); > > > > + spin_lock_irqsave(&sp->siglock, flags); > > > > + if (p->sighand != sp) { > > > > + spin_un
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
Paul E. McKenney wrote: > > > The other thing that jumped out at me is that signals are very different > > animals from a locking viewpoint depending on whether they are: > > > > 1. ignored, > > > > 2. caught by a single thread, > > > > 3. fatal to multiple threads/processes (though I don't know > > of anything that shares sighand_struct between separate > > processes), or > > > > 4. otherwise global to multiple threads/processes (such as > > SIGSTOP and SIGCONT). > > > > And there are probably other distinctions that I have not yet caught > > on to. > > > > One way to approach this would be to make your suggested lock_task_sighand() > > look at the signal and acquire the appropriate locks. If, having acquired > > a given set of locks, it found that the needed set had changed (e.g., due > > to racing exec() or sigaction()), then it drops the locks and retries. > > OK, for this sort of approach to work, lock_task_sighand() must take and > return some sort of mask indicating what locks are held. The mask returned > by lock_task_sighand() would then be passed to an unlock_task_sighand(). Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND, so we always need to lock one ->sighand. Could you please clarify? > > > + if (!ret && sig && (sp = p->sighand)) { > > > if (!get_task_struct_rcu(p)) { > > > return -ESRCH; > > > } > > > - spin_lock_irqsave(&p->sighand->siglock, flags); > > > + spin_lock_irqsave(&sp->siglock, flags); > > > + if (p->sighand != sp) { > > > + spin_unlock_irqrestore(&sp->siglock, flags); > > > + put_task_struct(p); > > > + goto retry; > > > + } > > > ret = __group_send_sig_info(sig, info, p); > > > - spin_unlock_irqrestore(&p->sighand->siglock, flags); > > > + spin_unlock_irqrestore(&sp->siglock, flags); > > > put_task_struct(p); > > > > Do we really need get_task_struct_rcu/put_task_struct here? > > > > The task_struct can't go away under us, it is rcu protected. > > When ->sighand is locked, and it is still the same after > > the re-check, it means that 'p' has not done __exit_signal() > > yet, so it is safe to send the signal. > > > > And if the task has ->usage == 0, it means that it also has > > ->sighand == NULL, and your code will notice that. > > > > No? > > Seems plausible. I got paranoid after seeing the lock dropped in > handle_stop_signal(), though. Yes, this is bad and should be fixed, I agree. But why do you think we need to bump task->usage? It can't make any difference, afaics. The task_struct can't dissapear, the caller was converted to use rcu_read_lock() or it holds tasklist_lock. Nonzero task_struct->usage can't stop do_exit or sys_wait4, it will only postpone call_rcu(__put_task_struct_cb). And after we locked ->sighand we have sufficient memory barrier, so if we read the stale value into 'sp' we will notice that (if you were worried about this issue). Am I missed something? > void exit_sighand(struct task_struct *tsk) > { > write_lock_irq(&tasklist_lock); > - __exit_sighand(tsk); > + spin_lock(&tsk->sighand->siglock); > + if (tsk->sighand != NULL) { > + __exit_sighand(tsk); > + } > + spin_unlock(&tsk->sighand->siglock); > write_unlock_irq(&tasklist_lock); > } Very strange code. Why this check? And what happens with spin_unlock(&tsk->sighand->siglock); when tsk->sighand == NULL ? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
* Paul E. McKenney <[EMAIL PROTECTED]> wrote: > My tests are not finding even glaring races, so time to go and create > some torture tests before getting too much more elaborate. 10,000 > eyes are nice (and Oleg's eyes do seem to be working especially well), > but a good software-test sledgehammer has its uses as well. i've merged this to the -rt tree, and find below a delta patch relative to the previous patch. Ingo Index: linux/kernel/exit.c === --- linux.orig/kernel/exit.c +++ linux/kernel/exit.c @@ -74,7 +74,6 @@ repeat: __ptrace_unlink(p); BUG_ON(!list_empty(&p->ptrace_list) || !list_empty(&p->ptrace_children)); __exit_signal(p); - __exit_sighand(p); /* * Note that the fastpath in sys_times depends on __exit_signal having * updated the counters before a task is removed from the tasklist of Index: linux/kernel/signal.c === --- linux.orig/kernel/signal.c +++ linux/kernel/signal.c @@ -328,17 +328,19 @@ void __exit_sighand(struct task_struct * struct sighand_struct * sighand = tsk->sighand; /* Ok, we're done with the signal handlers */ - spin_lock(&sighand->siglock); tsk->sighand = NULL; if (atomic_dec_and_test(&sighand->count)) sighand_free(sighand); - spin_unlock(&sighand->siglock); } void exit_sighand(struct task_struct *tsk) { write_lock_irq(&tasklist_lock); - __exit_sighand(tsk); + spin_lock(&tsk->sighand->siglock); + if (tsk->sighand != NULL) { + __exit_sighand(tsk); + } + spin_unlock(&tsk->sighand->siglock); write_unlock_irq(&tasklist_lock); } @@ -361,6 +363,7 @@ void __exit_signal(struct task_struct *t if (tsk == sig->curr_target) sig->curr_target = next_thread(tsk); tsk->signal = NULL; + __exit_sighand(tsk); spin_unlock(&sighand->siglock); flush_sigqueue(&sig->shared_pending); } else { @@ -392,6 +395,7 @@ void __exit_signal(struct task_struct *t sig->nvcsw += tsk->nvcsw; sig->nivcsw += tsk->nivcsw; sig->sched_time += tsk->sched_time; + __exit_sighand(tsk); spin_unlock(&sighand->siglock); sig = NULL; /* Marker for below. */ } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Tue, Aug 16, 2005 at 10:07:14AM -0700, Paul E. McKenney wrote: > On Tue, Aug 16, 2005 at 03:56:05PM +0400, Oleg Nesterov wrote: > > Paul E. McKenney wrote: > > > > > > OK, the attached instead revalidates that the task struct still references > > > the sighand_struct after obtaining the lock > > > > Personally I think this is a way to go. A nitpick suggestion, > > could you make a separate function (say, lock_task_sighand) > > which does all this job? > > Sounds good, will do! > > The other thing that jumped out at me is that signals are very different > animals from a locking viewpoint depending on whether they are: > > 1.ignored, > > 2.caught by a single thread, > > 3.fatal to multiple threads/processes (though I don't know > of anything that shares sighand_struct between separate > processes), or > > 4.otherwise global to multiple threads/processes (such as > SIGSTOP and SIGCONT). > > And there are probably other distinctions that I have not yet caught > on to. > > One way to approach this would be to make your suggested lock_task_sighand() > look at the signal and acquire the appropriate locks. If, having acquired > a given set of locks, it found that the needed set had changed (e.g., due > to racing exec() or sigaction()), then it drops the locks and retries. > > Does this make sense? > > This approach assumes that realtime latency (of the kill() operation > itself) is critical only cases #1 and #2 above. This makes sense to me, > but some of you might know of situations where #3 and #4 are important. > But I am hoping not. ;-) OK, for this sort of approach to work, lock_task_sighand() must take and return some sort of mask indicating what locks are held. The mask returned by lock_task_sighand() would then be passed to an unlock_task_sighand(). Another alternative would be to make a macro that allowed an arbitrary statement to be wrapped up in it. Neither of these are as appealing as I would like -- other ideas? I have not yet attacked this in the attached patch. > > > > and there are some remaining problems > > > > that I need to sort out, including: > > > ... > > > > > > o Some of the functions invoked by __group_send_sig_info(), > > > including handle_stop_signal(), momentarily drop ->siglock. > > > > Just to be sure that one point doesn't escape your attention, this: > > > > > +++ > > > linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c > > > 2005-08-14 19:53:28.0 -0700 > > > @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct * > > > struct sighand_struct * sighand = tsk->sighand; > > > > > > /* Ok, we're done with the signal handlers */ > > > + spin_lock(&sighand->siglock); > > > tsk->sighand = NULL; > > > if (atomic_dec_and_test(&sighand->count)) > > > - kmem_cache_free(sighand_cachep, sighand); > > > + sighand_free(sighand); > > > + spin_unlock(&sighand->siglock); > > > > is not enough (and unneeded). Unless I missed something, we have > > a race: > > > > release_task: > > > > __exit_signal: > > spin_lock(sighand); > > spin_unlock(sighand); > > flush_sigqueue(&sig->shared_pending); > > kmem_cache_free(tsk->signal); > > // here comes > > group_send_sig_info(), locks ->sighand, > > // delivers the signal > > to the ->shared_pending. > > // siginfo leaked, or > > crash. > > __exit_sighand: > > spin_lock(sighand); > > tsk->sighand = NULL; > > // too late > > > > I think that release_task() should not use __exit_sighand() > > at all. Instead, __exit_signal() should set tsk->sighand = NULL > > under ->sighand->lock. > > Will look into this -- I was inserting the locking to handle a race with > my revalidation. It looks like I also need to pay some more attention > to the race with exiting tasks, good catch! Your suggestion of invoking > __exit_signal() from under siglock within __exit_signal() sounds good > at first glance, will think it through. This turned out to be easy, the attached patch covers it. > > > int group_send_sig_info(int sig, struct siginfo *info, struct > > > task_struct *p) > > > { > > > unsigned long flags; > > > + struct sighand_struct *sp; > > > int ret; > > > > > > +retry: > > > ret = check_kill_permission(sig, info, p); > > > - if (!ret && sig && p->sighand) { > > > + if (!ret && sig && (sp = p->sighand)) { > > > if (!get_task_struct_rcu(p)) { > > > return -ESRCH; > > > } > > > - spin_lock_irqsave(&p->sighand->siglock, flags); > > > + spin_lock_irqsave(&sp->siglock, flags); > > > + if (p->sighand != sp) { > > > + spin_unlock_irqrestore(&sp->siglock, flags); > > > + put_task_struct(p); >
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Tue, Aug 16, 2005 at 03:56:05PM +0400, Oleg Nesterov wrote: > Paul E. McKenney wrote: > > > > OK, the attached instead revalidates that the task struct still references > > the sighand_struct after obtaining the lock > > Personally I think this is a way to go. A nitpick suggestion, > could you make a separate function (say, lock_task_sighand) > which does all this job? Sounds good, will do! The other thing that jumped out at me is that signals are very different animals from a locking viewpoint depending on whether they are: 1. ignored, 2. caught by a single thread, 3. fatal to multiple threads/processes (though I don't know of anything that shares sighand_struct between separate processes), or 4. otherwise global to multiple threads/processes (such as SIGSTOP and SIGCONT). And there are probably other distinctions that I have not yet caught on to. One way to approach this would be to make your suggested lock_task_sighand() look at the signal and acquire the appropriate locks. If, having acquired a given set of locks, it found that the needed set had changed (e.g., due to racing exec() or sigaction()), then it drops the locks and retries. Does this make sense? This approach assumes that realtime latency (of the kill() operation itself) is critical only cases #1 and #2 above. This makes sense to me, but some of you might know of situations where #3 and #4 are important. But I am hoping not. ;-) > > > and there are some remaining problems > > > that I need to sort out, including: > > ... > > > > o Some of the functions invoked by __group_send_sig_info(), > > including handle_stop_signal(), momentarily drop ->siglock. > > Just to be sure that one point doesn't escape your attention, this: > > > +++ > > linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c > > 2005-08-14 19:53:28.0 -0700 > > @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct * > > struct sighand_struct * sighand = tsk->sighand; > > > > /* Ok, we're done with the signal handlers */ > > + spin_lock(&sighand->siglock); > > tsk->sighand = NULL; > > if (atomic_dec_and_test(&sighand->count)) > > - kmem_cache_free(sighand_cachep, sighand); > > + sighand_free(sighand); > > + spin_unlock(&sighand->siglock); > > is not enough (and unneeded). Unless I missed something, we have > a race: > > release_task: > > __exit_signal: > spin_lock(sighand); > spin_unlock(sighand); > flush_sigqueue(&sig->shared_pending); > kmem_cache_free(tsk->signal); > // here comes > group_send_sig_info(), locks ->sighand, > // delivers the signal > to the ->shared_pending. > // siginfo leaked, or > crash. > __exit_sighand: > spin_lock(sighand); > tsk->sighand = NULL; > // too late > > I think that release_task() should not use __exit_sighand() > at all. Instead, __exit_signal() should set tsk->sighand = NULL > under ->sighand->lock. Will look into this -- I was inserting the locking to handle a race with my revalidation. It looks like I also need to pay some more attention to the race with exiting tasks, good catch! Your suggestion of invoking __exit_signal() from under siglock within __exit_signal() sounds good at first glance, will think it through. > > int group_send_sig_info(int sig, struct siginfo *info, struct task_struct > > *p) > > { > > unsigned long flags; > > + struct sighand_struct *sp; > > int ret; > > > > +retry: > > ret = check_kill_permission(sig, info, p); > > - if (!ret && sig && p->sighand) { > > + if (!ret && sig && (sp = p->sighand)) { > > if (!get_task_struct_rcu(p)) { > > return -ESRCH; > > } > > - spin_lock_irqsave(&p->sighand->siglock, flags); > > + spin_lock_irqsave(&sp->siglock, flags); > > + if (p->sighand != sp) { > > + spin_unlock_irqrestore(&sp->siglock, flags); > > + put_task_struct(p); > > + goto retry; > > + } > > ret = __group_send_sig_info(sig, info, p); > > - spin_unlock_irqrestore(&p->sighand->siglock, flags); > > + spin_unlock_irqrestore(&sp->siglock, flags); > > put_task_struct(p); > > Do we really need get_task_struct_rcu/put_task_struct here? > > The task_struct can't go away under us, it is rcu protected. > When ->sighand is locked, and it is still the same after > the re-check, it means that 'p' has not done __exit_signal() > yet, so it is safe to send the signal. > > And if the task has ->usage == 0, it means that it also has > ->sighand == NULL, and your code will notice that. > > No? Seems plau
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
Paul E. McKenney wrote: > > OK, the attached instead revalidates that the task struct still references > the sighand_struct after obtaining the lock Personally I think this is a way to go. A nitpick suggestion, could you make a separate function (say, lock_task_sighand) which does all this job? > > and there are some remaining problems > > that I need to sort out, including: > ... > > o Some of the functions invoked by __group_send_sig_info(), > including handle_stop_signal(), momentarily drop ->siglock. Just to be sure that one point doesn't escape your attention, this: > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c > 2005-08-14 19:53:28.0 -0700 > @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct * > struct sighand_struct * sighand = tsk->sighand; > > /* Ok, we're done with the signal handlers */ > + spin_lock(&sighand->siglock); > tsk->sighand = NULL; > if (atomic_dec_and_test(&sighand->count)) > - kmem_cache_free(sighand_cachep, sighand); > + sighand_free(sighand); > + spin_unlock(&sighand->siglock); is not enough (and unneeded). Unless I missed something, we have a race: release_task: __exit_signal: spin_lock(sighand); spin_unlock(sighand); flush_sigqueue(&sig->shared_pending); kmem_cache_free(tsk->signal); // here comes group_send_sig_info(), locks ->sighand, // delivers the signal to the ->shared_pending. // siginfo leaked, or crash. __exit_sighand: spin_lock(sighand); tsk->sighand = NULL; // too late I think that release_task() should not use __exit_sighand() at all. Instead, __exit_signal() should set tsk->sighand = NULL under ->sighand->lock. > int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p) > { > unsigned long flags; > + struct sighand_struct *sp; > int ret; > > +retry: > ret = check_kill_permission(sig, info, p); > - if (!ret && sig && p->sighand) { > + if (!ret && sig && (sp = p->sighand)) { > if (!get_task_struct_rcu(p)) { > return -ESRCH; > } > - spin_lock_irqsave(&p->sighand->siglock, flags); > + spin_lock_irqsave(&sp->siglock, flags); > + if (p->sighand != sp) { > + spin_unlock_irqrestore(&sp->siglock, flags); > + put_task_struct(p); > + goto retry; > + } > ret = __group_send_sig_info(sig, info, p); > - spin_unlock_irqrestore(&p->sighand->siglock, flags); > + spin_unlock_irqrestore(&sp->siglock, flags); > put_task_struct(p); Do we really need get_task_struct_rcu/put_task_struct here? The task_struct can't go away under us, it is rcu protected. When ->sighand is locked, and it is still the same after the re-check, it means that 'p' has not done __exit_signal() yet, so it is safe to send the signal. And if the task has ->usage == 0, it means that it also has ->sighand == NULL, and your code will notice that. No? Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
* Paul E. McKenney <[EMAIL PROTECTED]> wrote: > OK, the attached instead revalidates that the task struct still > references the sighand_struct after obtaining the lock (and passes > kernbench and LTP, which tells me I need to get better tests!). i've applied this to the -RT tree, and it's looking good so far from a basic stability POV. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Fri, Aug 12, 2005 at 12:51:17PM +0400, Oleg Nesterov wrote: > Paul E. McKenney wrote: > > > > --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c 2005-08-11 > > 11:44:55.0 -0700 > > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c > > 2005-08-11 12:26:45.0 -0700 > > [ ... snip ... ] > > @@ -785,11 +787,13 @@ no_thread_group: > > recalc_sigpending(); > > > > + oldsighand->deleted = 1; > > + oldsighand->successor = newsighand; > > I don't think this is correct. > > This ->oldsighand can be shared with another CLONE_SIGHAND > process and will not be deleted, just unshared. > > When the signal is sent to that process we must use ->oldsighand > for locking, not the oldsighand->successor == newsighand. OK, the attached instead revalidates that the task struct still references the sighand_struct after obtaining the lock (and passes kernbench and LTP, which tells me I need to get better tests!). However, your quick responses motivated me to read more of the code (yes, I know, I should have done -that- beforehand!!!), and there are some remaining problems that I need to sort out, including: o Fatal signals affect all threads in a process, and the existing code looks like it assumes that tasklist_lock is held during delivery of such signals. In this patch, that assumption does not hold. o Some of the functions invoked by __group_send_sig_info(), including handle_stop_signal(), momentarily drop ->siglock. I will be looking at a few approaches to handle these situations, but, in the meantime, any other combinations that I missed? Thanx, Paul diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c 2005-08-11 11:44:55.0 -0700 +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c 2005-08-12 21:18:33.0 -0700 @@ -790,7 +790,7 @@ no_thread_group: write_unlock_irq(&tasklist_lock); if (atomic_dec_and_test(&oldsighand->count)) - kmem_cache_free(sighand_cachep, oldsighand); + sighand_free(oldsighand); } BUG_ON(!thread_group_empty(current)); diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h 2005-08-11 11:44:57.0 -0700 +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h 2005-08-12 21:19:34.0 -0700 @@ -450,8 +450,16 @@ struct sighand_struct { atomic_tcount; struct k_sigaction action[_NSIG]; spinlock_t siglock; + struct rcu_head rcu; }; +static inline void sighand_free(struct sighand_struct *sp) +{ + extern void sighand_free_cb(struct rcu_head *rhp); + + call_rcu(&sp->rcu, sighand_free_cb); +} + /* * NOTE! "signal_struct" does not have it's own * locking, because a shared signal_struct always diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c 2005-08-11 11:44:57.0 -0700 +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c 2005-08-12 21:24:00.0 -0700 @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -769,6 +770,14 @@ int unshare_files(void) EXPORT_SYMBOL(unshare_files); +void sighand_free_cb(struct rcu_head *rhp) +{ + struct sighand_struct *sp = + container_of(rhp, struct sighand_struct, rcu); + + kmem_cache_free(sighand_cachep, sp); +} + static inline int copy_sighand(unsigned long clone_flags, struct task_struct * tsk) { struct sighand_struct *sig; diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c 2005-08-11 11:44:57.0 -0700 +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c 2005-08-14 19:53:28.0 -0700 @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct * struct sighand_struct * sighand = tsk->sighand; /* Ok, we're done with the signal handlers */ + spin_lock(&sighand->siglock); tsk->sighand = NULL; if (atomic_dec_and_test(&sighand->count)) - kmem_cache_free(sighand_cachep, sighand); + sighand_free(sighand); + spin_unlock(&
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Fri, Aug 12, 2005 at 08:36:00AM +0200, Ingo Molnar wrote: > > * Lee Revell <[EMAIL PROTECTED]> wrote: > > > Doesn't this fix the longest latency we were seeing with > > PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal > > delivery must remain atomic on !PREEMPT_RT"? > > yes - although Paul's patch converts only a portion of the signal code > to RCU-read-lock, so i'd expect there to be other latencies too. Might > be worth a test (once we've sorted out the HRT build bugs). And there remains the question of how much of the benefit remains after I handle the corner cases... :-/ Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Fri, Aug 12, 2005 at 12:51:17PM +0400, Oleg Nesterov wrote: > Paul E. McKenney wrote: > > > > --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c 2005-08-11 > > 11:44:55.0 -0700 > > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c > > 2005-08-11 12:26:45.0 -0700 > > [ ... snip ... ] > > @@ -785,11 +787,13 @@ no_thread_group: > > recalc_sigpending(); > > > > + oldsighand->deleted = 1; > > + oldsighand->successor = newsighand; > > I don't think this is correct. > > This ->oldsighand can be shared with another CLONE_SIGHAND > process and will not be deleted, just unshared. > > When the signal is sent to that process we must use ->oldsighand > for locking, not the oldsighand->successor == newsighand. Hmmm... Sounds like I mostly managed to dig myself in deeper here. Back to the drawing board... Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
Paul E. McKenney wrote: > > --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c2005-08-11 > 11:44:55.0 -0700 > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c > 2005-08-11 12:26:45.0 -0700 > [ ... snip ... ] > @@ -785,11 +787,13 @@ no_thread_group: > recalc_sigpending(); > > + oldsighand->deleted = 1; > + oldsighand->successor = newsighand; I don't think this is correct. This ->oldsighand can be shared with another CLONE_SIGHAND process and will not be deleted, just unshared. When the signal is sent to that process we must use ->oldsighand for locking, not the oldsighand->successor == newsighand. Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
* Lee Revell <[EMAIL PROTECTED]> wrote: > Doesn't this fix the longest latency we were seeing with > PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal > delivery must remain atomic on !PREEMPT_RT"? yes - although Paul's patch converts only a portion of the signal code to RCU-read-lock, so i'd expect there to be other latencies too. Might be worth a test (once we've sorted out the HRT build bugs). Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Thu, 2005-08-11 at 11:56 +0200, Ingo Molnar wrote: > > For the record, some shortcomings of this patch: > > > > o Needs lots more testing on more architectures. > > > > o Needs performance and stress testing. > > > > o Needs testing in Ingo's PREEMPT_RT environment. > > cool patch! I have integrated it into my PREEMPT_RT tree, and all it > needed to boot was the patch below (doesnt affect the upstream kernel). > Using the raw IRQ flag isnt an issue in the RCU code, all the affected > codepaths are small and deterministic. > Ingo, Doesn't this fix the longest latency we were seeing with PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal delivery must remain atomic on !PREEMPT_RT"? Lee - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Thu, Aug 11, 2005 at 04:16:53PM +0400, Oleg Nesterov wrote: > Paul E. McKenney wrote: > > > > --- linux-2.6.13-rc6/kernel/signal.c2005-08-08 19:59:24.0 > > -0700 > > +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c2005-08-10 > > 08:20:25.0 -0700 > > @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct > > > > ret = check_kill_permission(sig, info, p); > > if (!ret && sig && p->sighand) { > > + if (!get_task_struct_rcu(p)) { > > + return -ESRCH; > > + } > > spin_lock_irqsave(&p->sighand->siglock, flags); > ^^^ > Is it correct? Most definitely not! Thank you again for catching this one, would have taken some serious test-and-debug time to root it out the hard way. Fix provided as a patch against V0.7.53-01, probably still has some bugs, as it has not yet been thoroughly tested. General approach is to RCU-protect the sighand pointer from task_struct to sighand_struct. Will be testing more thoroughly, in the meantime, thoughts? Thanx, Paul fs/exec.c |6 +- include/linux/sched.h | 10 ++ kernel/fork.c | 11 +++ kernel/signal.c | 16 +--- 4 files changed, 39 insertions(+), 4 deletions(-) diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c 2005-08-11 11:44:55.0 -0700 +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c 2005-08-11 12:26:45.0 -0700 @@ -773,6 +773,8 @@ no_thread_group: */ spin_lock_init(&newsighand->siglock); atomic_set(&newsighand->count, 1); + newsighand->deleted = 0; + newsighand->successor = NULL; memcpy(newsighand->action, oldsighand->action, sizeof(newsighand->action)); @@ -785,12 +787,14 @@ no_thread_group: recalc_sigpending(); task_unlock(current); + oldsighand->deleted = 1; + oldsighand->successor = newsighand; spin_unlock(&newsighand->siglock); spin_unlock(&oldsighand->siglock); write_unlock_irq(&tasklist_lock); if (atomic_dec_and_test(&oldsighand->count)) - kmem_cache_free(sighand_cachep, oldsighand); + sighand_free(oldsighand); } BUG_ON(!thread_group_empty(current)); diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h 2005-08-11 11:44:57.0 -0700 +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h 2005-08-11 12:17:01.0 -0700 @@ -450,8 +450,18 @@ struct sighand_struct { atomic_tcount; struct k_sigaction action[_NSIG]; spinlock_t siglock; + int deleted; + struct sighand_struct *successor; + struct rcu_head rcu; }; +static inline void sighand_free(struct sighand_struct *sp) +{ + extern void sighand_free_cb(struct rcu_head *rhp); + + call_rcu(&sp->rcu, sighand_free_cb); +} + /* * NOTE! "signal_struct" does not have it's own * locking, because a shared signal_struct always diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c 2005-08-11 11:44:57.0 -0700 +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c 2005-08-11 13:05:17.0 -0700 @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -769,6 +770,14 @@ int unshare_files(void) EXPORT_SYMBOL(unshare_files); +void sighand_free_cb(struct rcu_head *rhp) +{ + struct sighand_struct *sp = + container_of(rhp, struct sighand_struct, rcu); + + kmem_cache_free(sighand_cachep, sp); +} + static inline int copy_sighand(unsigned long clone_flags, struct task_struct * tsk) { struct sighand_struct *sig; @@ -783,6 +792,8 @@ static inline int copy_sighand(unsigned return -ENOMEM; spin_lock_init(&sig->siglock); atomic_set(&sig->count, 1); + sig->deleted = 0; + sig->successor = 0; memcpy(sig->action, current->sighand->action, sizeof(sig->action)); return 0; } diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Thu, Aug 11, 2005 at 11:30:44PM +0530, Dipankar Sarma wrote: > When I worked on this last (a year or so ago), it seemed that I would > need to put a number of additional structures under RCU control. > It would be better to gradually move it towards RCU rather than > trying make all the readers lock-free. Just for reference, this was my tasks-rcu patch. 2.6.0-test2, no less :) I was interested in get_pid_list() at that time. IIRC, I tested it with lots of top running along with other tests. Thanks Dipankar Incremental patch to do lockfree traversal of the task list using RCU. For now it just does one of the costlies ones in /proc. Signed-off-by: Dipankar Sarma <[EMAIL PROTECTED]> fs/exec.c |1 + fs/proc/base.c|5 +++-- include/linux/sched.h | 21 - 3 files changed, 20 insertions(+), 7 deletions(-) diff -puN fs/exec.c~tasks-rcu fs/exec.c --- linux-2.6.0-test2-ds/fs/exec.c~tasks-rcu2003-08-04 21:48:38.0 +0530 +++ linux-2.6.0-test2-ds-dipankar/fs/exec.c 2003-08-04 21:49:26.0 +0530 @@ -676,6 +676,7 @@ static inline int de_thread(struct task_ list_del(¤t->tasks); list_add_tail(¤t->tasks, &init_task.tasks); + list_add_tail_rcu(¤t->tasks, &init_task.tasks); current->exit_signal = SIGCHLD; state = leader->state; diff -puN fs/proc/base.c~tasks-rcu fs/proc/base.c --- linux-2.6.0-test2-ds/fs/proc/base.c~tasks-rcu 2003-08-04 21:48:38.0 +0530 +++ linux-2.6.0-test2-ds-dipankar/fs/proc/base.c2003-08-04 21:49:59.0 +0530 @@ -32,6 +32,7 @@ #include #include #include +#include /* * For hysterical raisins we keep the same inumbers as in the old procfs. @@ -1403,7 +1404,7 @@ static int get_pid_list(int index, unsig int nr_pids = 0; index--; - read_lock(&tasklist_lock); + rcu_read_lock(); for_each_process(p) { int pid = p->pid; if (!pid_alive(p)) @@ -1415,7 +1416,7 @@ static int get_pid_list(int index, unsig if (nr_pids >= PROC_MAXPIDS) break; } - read_unlock(&tasklist_lock); + rcu_read_unlock(); return nr_pids; } diff -puN include/linux/sched.h~tasks-rcu include/linux/sched.h --- linux-2.6.0-test2-ds/include/linux/sched.h~tasks-rcu2003-08-04 21:48:38.0 +0530 +++ linux-2.6.0-test2-ds-dipankar/include/linux/sched.h 2003-08-04 21:54:58.0 +0530 @@ -28,6 +28,7 @@ #include #include #include +#include struct exec_domain; @@ -456,13 +457,23 @@ struct task_struct { struct io_context *io_context; unsigned long ptrace_message; + struct rcu_head rcu; siginfo_t *last_siginfo; /* For ptrace use. */ }; extern void __put_task_struct(struct task_struct *tsk); #define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0) -#define put_task_struct(tsk) \ -do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0) +static void put_task_struct_rcu(struct rcu_head *rcu) +{ + struct task_struct *tsk = container_of(rcu, struct task_struct, rcu); + __put_task_struct(tsk); +} +static inline void put_task_struct(struct task_struct *tsk) +{ + if (atomic_dec_and_test(&tsk->usage)) + call_rcu(&tsk->rcu, put_task_struct_rcu); +} + /* * Per process flags @@ -675,13 +686,13 @@ extern void wait_task_inactive(task_t * #define REMOVE_LINKS(p) do { \ if (thread_group_leader(p)) \ - list_del_init(&(p)->tasks); \ + list_del_rcu(&(p)->tasks); \ remove_parent(p); \ } while (0) #define SET_LINKS(p) do { \ if (thread_group_leader(p)) \ - list_add_tail(&(p)->tasks,&init_task.tasks);\ + list_add_tail_rcu(&(p)->tasks,&init_task.tasks);\ add_parent(p, (p)->parent); \ } while (0) @@ -689,7 +700,7 @@ extern void wait_task_inactive(task_t * #define prev_task(p) list_entry((p)->tasks.prev, struct task_struct, tasks) #define for_each_process(p) \ - for (p = &init_task ; (p = next_task(p)) != &init_task ; ) + for (p = &init_task ; (p = next_task(p)),({ read_barrier_depends(); 0;}),p != &init_task ; ) /* * Careful: do_each_thread/while_each_thread is a double loop so _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Thu, Aug 11, 2005 at 06:14:51PM +0100, Christoph Hellwig wrote: > On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote: > > Hello! > > > > This patch is an experiment in use of RCU for individual code paths that > > read-acquire the tasklist lock, in this case, unicast signal delivery. > > It passes five kernbenches on 4-CPU x86, but obviously needs much more > > testing before it is considered for serious use, let alone inclusion. > > I think we should switch over tasklist_lock to RCU completely instead of > adding suck hacks. I've started lots of preparation work to get rid of > tasklist_lock users outside of kernel/, especialy getting rid of any > use in modules. That would be really helpful, specially the ones in drivers/char/tty*.c :) When I worked on this last (a year or so ago), it seemed that I would need to put a number of additional structures under RCU control. It would be better to gradually move it towards RCU rather than trying make all the readers lock-free. Thanks Dipankar - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Thu, Aug 11, 2005 at 06:14:51PM +0100, Christoph Hellwig wrote: > On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote: > > Hello! > > > > This patch is an experiment in use of RCU for individual code paths that > > read-acquire the tasklist lock, in this case, unicast signal delivery. > > It passes five kernbenches on 4-CPU x86, but obviously needs much more > > testing before it is considered for serious use, let alone inclusion. > > I think we should switch over tasklist_lock to RCU completely instead of > adding suck hacks. I've started lots of preparation work to get rid of > tasklist_lock users outside of kernel/, especialy getting rid of any > use in modules. I agree fully with your end goal -- I would certainly look funny disagreeing, right? ;-) And Oleg pointed out an area I missed in my patch, am fixing, but in the meantime it most certainly does qualify as a suck hack. :-/ Working on it... But one of the cool things about RCU is that we can make this change incrementally, with a series of small patches. We can have some of the read paths protected by RCU and others continuing to be protected by read_lock(), so that we can avoid the need for a "flag day" that hits all 300+ uses of tasklist_lock. FWIW, the approach that I am taking to fix the bug Oleg and Christoph spotted is roughly as follows: o Add "struct rcu_head rcu", "struct sighand_struct *successor", and "int deleted" to struct sighand_struct. This allows reliable signal delivery in face of sighand_struct replacements. If an RCU reader finds (deleted && !successor), that reader is trying to signal a dying process, so gives up. If an RCU reader instead finds (deleted && successor), the reader traverses the successor pointer to find the current version. o Apply call_rcu() to deletion of struct sighand_struct, with the exception of a couple of failure paths in exec.c, where the sighand_struct was never exposed to RCU readers. Thoughts? Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote: > Hello! > > This patch is an experiment in use of RCU for individual code paths that > read-acquire the tasklist lock, in this case, unicast signal delivery. > It passes five kernbenches on 4-CPU x86, but obviously needs much more > testing before it is considered for serious use, let alone inclusion. I think we should switch over tasklist_lock to RCU completely instead of adding suck hacks. I've started lots of preparation work to get rid of tasklist_lock users outside of kernel/, especialy getting rid of any use in modules. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Thu, Aug 11, 2005 at 04:16:53PM +0400, Oleg Nesterov wrote: > Paul E. McKenney wrote: > > > > --- linux-2.6.13-rc6/kernel/signal.c2005-08-08 19:59:24.0 > > -0700 > > +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c2005-08-10 > > 08:20:25.0 -0700 > > @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct > > > > ret = check_kill_permission(sig, info, p); > > if (!ret && sig && p->sighand) { > > + if (!get_task_struct_rcu(p)) { > > + return -ESRCH; > > + } > > spin_lock_irqsave(&p->sighand->siglock, flags); > ^^^ > Is it correct? > > The caller (kill_proc_info) does not take tasklist_lock anymore. > If p does exec() at this time it can change/free its ->sighand. > > fs/exec.c:de_thread() >773 write_lock_irq(&tasklist_lock); >774 spin_lock(&oldsighand->siglock); >775 spin_lock(&newsighand->siglock); >776 >777 current->sighand = newsighand; >778 recalc_sigpending(); >779 >780 spin_unlock(&newsighand->siglock); >781 spin_unlock(&oldsighand->siglock); >782 write_unlock_irq(&tasklist_lock); >783 >784 if (atomic_dec_and_test(&oldsighand->count)) >785 kmem_cache_free(sighand_cachep, oldsighand); Looks suspicious to me! ;-) Will look into this one, thank you for pointing it out! Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
On Thu, Aug 11, 2005 at 11:56:34AM +0200, Ingo Molnar wrote: > > * Paul E. McKenney <[EMAIL PROTECTED]> wrote: > > > Hello! > > > > This patch is an experiment in use of RCU for individual code paths > > that read-acquire the tasklist lock, in this case, unicast signal > > delivery. It passes five kernbenches on 4-CPU x86, but obviously needs > > much more testing before it is considered for serious use, let alone > > inclusion. > > > > My main question is whether I have the POSIX semantics covered. I > > believe that I do, but thought I should check with people who are more > > familiar with POSIX than am I. > > > > For the record, some shortcomings of this patch: > > > > o Needs lots more testing on more architectures. > > > > o Needs performance and stress testing. > > > > o Needs testing in Ingo's PREEMPT_RT environment. > > cool patch! I have integrated it into my PREEMPT_RT tree, and all it > needed to boot was the patch below (doesnt affect the upstream kernel). > Using the raw IRQ flag isnt an issue in the RCU code, all the affected > codepaths are small and deterministic. > > (without this patch it locked up after detecting IRQ7 - not sure why.) Without this patch on an older version of PREEMPT_RT (V0.7.52-12), it would boot, pass kernbench, but fail LTP. Passed both on a stock kernel. Will re-run with your patch. ;-) > kernel still works fine after some (mostly light) testing. Cool! Next step for me is to run some focussed stress tests. Thanx, Paul > Ingo > > Index: linux/kernel/rcupdate.c > === > --- linux.orig/kernel/rcupdate.c > +++ linux/kernel/rcupdate.c > @@ -134,11 +134,11 @@ void fastcall call_rcu(struct rcu_head * > > head->func = func; > head->next = NULL; > - local_irq_save(flags); > + raw_local_irq_save(flags); > rdp = &__get_cpu_var(rcu_data); > *rdp->nxttail = head; > rdp->nxttail = &head->next; > - local_irq_restore(flags); > + raw_local_irq_restore(flags); > } > > /** > @@ -165,11 +165,11 @@ void fastcall call_rcu_bh(struct rcu_hea > > head->func = func; > head->next = NULL; > - local_irq_save(flags); > + raw_local_irq_save(flags); > rdp = &__get_cpu_var(rcu_bh_data); > *rdp->nxttail = head; > rdp->nxttail = &head->next; > - local_irq_restore(flags); > + raw_local_irq_restore(flags); > } > > /* > @@ -305,11 +305,11 @@ static void rcu_check_quiescent_state(st > static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list, > struct rcu_head **tail) > { > - local_irq_disable(); > + raw_local_irq_disable(); > *this_rdp->nxttail = list; > if (list) > this_rdp->nxttail = tail; > - local_irq_enable(); > + raw_local_irq_enable(); > } > > static void __rcu_offline_cpu(struct rcu_data *this_rdp, > @@ -362,13 +362,13 @@ static void __rcu_process_callbacks(stru > rdp->curtail = &rdp->curlist; > } > > - local_irq_disable(); > + raw_local_irq_disable(); > if (rdp->nxtlist && !rdp->curlist) { > rdp->curlist = rdp->nxtlist; > rdp->curtail = rdp->nxttail; > rdp->nxtlist = NULL; > rdp->nxttail = &rdp->nxtlist; > - local_irq_enable(); > + raw_local_irq_enable(); > > /* >* start the next batch of callbacks > @@ -388,7 +388,7 @@ static void __rcu_process_callbacks(stru > spin_unlock(&rsp->lock); > } > } else { > - local_irq_enable(); > + raw_local_irq_enable(); > } > rcu_check_quiescent_state(rcp, rsp, rdp); > if (rdp->donelist) > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
Paul E. McKenney wrote: > > --- linux-2.6.13-rc6/kernel/signal.c 2005-08-08 19:59:24.0 -0700 > +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c 2005-08-10 > 08:20:25.0 -0700 > @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct > > ret = check_kill_permission(sig, info, p); > if (!ret && sig && p->sighand) { > + if (!get_task_struct_rcu(p)) { > + return -ESRCH; > + } > spin_lock_irqsave(&p->sighand->siglock, flags); ^^^ Is it correct? The caller (kill_proc_info) does not take tasklist_lock anymore. If p does exec() at this time it can change/free its ->sighand. fs/exec.c:de_thread() 773 write_lock_irq(&tasklist_lock); 774 spin_lock(&oldsighand->siglock); 775 spin_lock(&newsighand->siglock); 776 777 current->sighand = newsighand; 778 recalc_sigpending(); 779 780 spin_unlock(&newsighand->siglock); 781 spin_unlock(&oldsighand->siglock); 782 write_unlock_irq(&tasklist_lock); 783 784 if (atomic_dec_and_test(&oldsighand->count)) 785 kmem_cache_free(sighand_cachep, oldsighand); Oleg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals
* Paul E. McKenney <[EMAIL PROTECTED]> wrote: > Hello! > > This patch is an experiment in use of RCU for individual code paths > that read-acquire the tasklist lock, in this case, unicast signal > delivery. It passes five kernbenches on 4-CPU x86, but obviously needs > much more testing before it is considered for serious use, let alone > inclusion. > > My main question is whether I have the POSIX semantics covered. I > believe that I do, but thought I should check with people who are more > familiar with POSIX than am I. > > For the record, some shortcomings of this patch: > > o Needs lots more testing on more architectures. > > o Needs performance and stress testing. > > o Needs testing in Ingo's PREEMPT_RT environment. cool patch! I have integrated it into my PREEMPT_RT tree, and all it needed to boot was the patch below (doesnt affect the upstream kernel). Using the raw IRQ flag isnt an issue in the RCU code, all the affected codepaths are small and deterministic. (without this patch it locked up after detecting IRQ7 - not sure why.) kernel still works fine after some (mostly light) testing. Ingo Index: linux/kernel/rcupdate.c === --- linux.orig/kernel/rcupdate.c +++ linux/kernel/rcupdate.c @@ -134,11 +134,11 @@ void fastcall call_rcu(struct rcu_head * head->func = func; head->next = NULL; - local_irq_save(flags); + raw_local_irq_save(flags); rdp = &__get_cpu_var(rcu_data); *rdp->nxttail = head; rdp->nxttail = &head->next; - local_irq_restore(flags); + raw_local_irq_restore(flags); } /** @@ -165,11 +165,11 @@ void fastcall call_rcu_bh(struct rcu_hea head->func = func; head->next = NULL; - local_irq_save(flags); + raw_local_irq_save(flags); rdp = &__get_cpu_var(rcu_bh_data); *rdp->nxttail = head; rdp->nxttail = &head->next; - local_irq_restore(flags); + raw_local_irq_restore(flags); } /* @@ -305,11 +305,11 @@ static void rcu_check_quiescent_state(st static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list, struct rcu_head **tail) { - local_irq_disable(); + raw_local_irq_disable(); *this_rdp->nxttail = list; if (list) this_rdp->nxttail = tail; - local_irq_enable(); + raw_local_irq_enable(); } static void __rcu_offline_cpu(struct rcu_data *this_rdp, @@ -362,13 +362,13 @@ static void __rcu_process_callbacks(stru rdp->curtail = &rdp->curlist; } - local_irq_disable(); + raw_local_irq_disable(); if (rdp->nxtlist && !rdp->curlist) { rdp->curlist = rdp->nxtlist; rdp->curtail = rdp->nxttail; rdp->nxtlist = NULL; rdp->nxttail = &rdp->nxtlist; - local_irq_enable(); + raw_local_irq_enable(); /* * start the next batch of callbacks @@ -388,7 +388,7 @@ static void __rcu_process_callbacks(stru spin_unlock(&rsp->lock); } } else { - local_irq_enable(); + raw_local_irq_enable(); } rcu_check_quiescent_state(rcp, rsp, rdp); if (rdp->donelist) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC,PATCH] Use RCU to protect tasklist for unicast signals
Hello! This patch is an experiment in use of RCU for individual code paths that read-acquire the tasklist lock, in this case, unicast signal delivery. It passes five kernbenches on 4-CPU x86, but obviously needs much more testing before it is considered for serious use, let alone inclusion. My main question is whether I have the POSIX semantics covered. I believe that I do, but thought I should check with people who are more familiar with POSIX than am I. For the record, some shortcomings of this patch: o Needs lots more testing on more architectures. o Needs performance and stress testing. o Needs testing in Ingo's PREEMPT_RT environment. o Uses cmpxchg(), which is currently architecture dependent. This can be fixed, for example, by using the hashed locks proposed in an earlier patch from Dipankar: http://marc.theaimsgroup.com/?l=linux-kernel&m=111875978502912&w=2 Thoughts? Thanx, Paul --- Not-signed-off-by: [EMAIL PROTECTED] include/linux/sched.h | 27 +-- kernel/sched.c|5 + kernel/signal.c |8 ++-- 3 files changed, 36 insertions(+), 4 deletions(-) diff -urpN -X dontdiff linux-2.6.13-rc6/include/linux/sched.h linux-2.6.13-rc6-tasklistRCU/include/linux/sched.h --- linux-2.6.13-rc6/include/linux/sched.h 2005-08-08 19:59:23.0 -0700 +++ linux-2.6.13-rc6-tasklistRCU/include/linux/sched.h 2005-08-09 15:44:48.0 -0700 @@ -34,6 +34,7 @@ #include #include #include +#include struct exec_domain; @@ -770,6 +771,7 @@ struct task_struct { int cpuset_mems_generation; #endif atomic_t fs_excl; /* holding fs exclusive resources */ + struct rcu_head rcu; }; static inline pid_t process_group(struct task_struct *tsk) @@ -793,8 +795,29 @@ static inline int pid_alive(struct task_ extern void free_task(struct task_struct *tsk); extern void __put_task_struct(struct task_struct *tsk); #define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0) -#define put_task_struct(tsk) \ -do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0) + +static inline int get_task_struct_rcu(struct task_struct *t) +{ + int oldusage; + + do { + oldusage = atomic_read(&t->usage); + if (oldusage == 0) { + return 0; + } + } while (cmpxchg(&t->usage.counter, +oldusage, oldusage + 1) != oldusage); + return 1; +} + +extern void __put_task_struct_cb(struct rcu_head *rhp); + +static inline void put_task_struct(struct task_struct *t) +{ + if (atomic_dec_and_test(&t->usage)) { + call_rcu(&t->rcu, __put_task_struct_cb); + } +} /* * Per process flags diff -urpN -X dontdiff linux-2.6.13-rc6/kernel/sched.c linux-2.6.13-rc6-tasklistRCU/kernel/sched.c --- linux-2.6.13-rc6/kernel/sched.c 2005-08-08 19:59:24.0 -0700 +++ linux-2.6.13-rc6-tasklistRCU/kernel/sched.c 2005-08-09 12:27:34.0 -0700 @@ -176,6 +176,11 @@ static unsigned int task_timeslice(task_ #define task_hot(p, now, sd) ((long long) ((now) - (p)->last_ran) \ < (long long) (sd)->cache_hot_time) +void __put_task_struct_cb(struct rcu_head *rhp) +{ + __put_task_struct(container_of(rhp, struct task_struct, rcu)); +} + /* * These are the runqueue data structures: */ diff -urpN -X dontdiff linux-2.6.13-rc6/kernel/signal.c linux-2.6.13-rc6-tasklistRCU/kernel/signal.c --- linux-2.6.13-rc6/kernel/signal.c2005-08-08 19:59:24.0 -0700 +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c2005-08-10 08:20:25.0 -0700 @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct ret = check_kill_permission(sig, info, p); if (!ret && sig && p->sighand) { + if (!get_task_struct_rcu(p)) { + return -ESRCH; + } spin_lock_irqsave(&p->sighand->siglock, flags); ret = __group_send_sig_info(sig, info, p); spin_unlock_irqrestore(&p->sighand->siglock, flags); + put_task_struct(p); } return ret; @@ -1200,12 +1204,12 @@ kill_proc_info(int sig, struct siginfo * int error; struct task_struct *p; - read_lock(&tasklist_lock); + rcu_read_lock(); p = find_task_by_pid(pid); error = -ESRCH; if (p) error = group_send_sig_info(sig, info, p); - read_unlock(&tasklist_lock); + rcu_read_unlock(); return error; } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/