Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-19 Thread Paul E. McKenney
On Fri, Aug 19, 2005 at 05:27:24PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > I have indeed been thinking along these lines, but all of the devil plans
> > that I have come up thus far are quite intrusive, and might have some
> > performance problems in some situations.  So it seems best to remove
> > tasklist_lock in steps:
> >
> > 1.  Single-recipient catch and ignore cases.
> >
> > 2.  Single-recipient stop/continue cases.
> >
> > 3.  Single-recipient fatal cases.
> >
> > 4.  Single-process multi-threaded stop/continue cases.
> >
> > 5.  Single-process multi-threaded fatal cases.
> >
> > 6.  And on to process-group cases.
> 
> Paul, I am not an expert at all, but honestly I don't see how
> this could be achieved. This lock is heavily overloaded for
> quite different purposes. I think that may be it makes sense
> to try other steps, for example (random order):
> 
>   1. Tasklist protects ->sighand changing (de_thread) - rework
>  sighand access/locking.

We have a start on this, and this is the first area I am making tests
for.

>   2. Tasklist protects reparenting - fix this.

Good point...  Will require thought.  And tests that target this race.
The tests are what I am working now.

>   ...
> 
>   N. PTRACE!!! Well, I close my eyes immediately when I see this
>  word in the sources.

This one as well.  Hmmm  The tests could be fun -- race gdb
grabbing a process with sending that process a signal.  Might
need to instead make a small program that pretends to be gdb,
will play with it.

Some others that come to mind:

N+1.Process-group leader dies while process-group signal is being
received.

N+2.Process-group members are wildly creating new processes while
a fatal signal is being received (which should result in -all-
processes in the group dying).

N+3.Threads are being wildly created in a multithreaded process while
a fatal signal is being received...

N+4.The tsk->signal->curr_target in a multithreaded process changes
while a signal is being received.

N+5.The handler state (default/ignored/caught) changes while a
signal is being received.  But siglock should cover this.

N+6.Different signal reception orders for different tasks in
a given set of groups (e.g., fatal signal to process group
at same time as fatal signal to multithreaded process within
that group).  Don't believe that this one matters, but thought
I should call it out anyway.

o   SIGHUP/SIGCONT from tty going away vs. SIGSTOP
from elsewhere.

o   tty signal (ctrl-Z, ctrl-C, ...) vs. signal to
specific process in the tty's process group.

And there are probably more, but this should keep me busy for a bit.

> Only then we can eliminate tasklist locking from signal sending
> path. But I don't see the easy way to solve any of these 1 - N
> problems.

If it was easy, someone would likely already have done it...

> Currently I don't see how your patch could be "fixed" for SIGCONT
> case, except very ugly:
> 
>   kill_proc_info(sig)
>   {
>   p = find_task_by_pid(pid);
>   if (sig == SIGCONT)
>   read_lock(&tasklist_lock);
> 
>   error = group_send_sig_info(...);
>   ...
>   }

I was indeed thinking along these lines.

> But there are other problems too.
> 
> Look at __group_complete_signal(), it scans p->pids[PIDTYPE_TGID].pid_list
> list to find a a suitable thread. What if 'p' does clone(CLONE_THREAD) now?
> Let's look at copy_process(), it does attach_pid(p, PIDTYPE_TGID, p->tgid)
> under the lovely tasklist_lock again.

Another approach that I am considering for the class of problem is to
have the code paths that create or change tasks momentarily acquire
locks corresponding to the groups of tasks that can receive signals.
There are a number of problems with this, for example, the logical place
for the lock corresponding to a process group is the process-group leader.
There has to be a way to handling the case where the process-group leader
dies before the rest of the processes do.  And so on, for similar changes.

> So, I don't beleive we can solve even the simplest case (single-recipient,
> non fatal, non stop/cont) without significant locking rework.

Agreed, hence the "Not-signed-off-by:" on my original patch.

> I hope that your patch will stimulate this work.

Me too, will continue working it.  Your comments have been very helpful!

If nothing else, this thread has shown that there are some worthwhile
benefits from getting rid of tasklist_lock...

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-19 Thread Oleg Nesterov
Paul E. McKenney wrote:
>
> I have indeed been thinking along these lines, but all of the devil plans
> that I have come up thus far are quite intrusive, and might have some
> performance problems in some situations.  So it seems best to remove
> tasklist_lock in steps:
>
> 1.Single-recipient catch and ignore cases.
>
> 2.Single-recipient stop/continue cases.
>
> 3.Single-recipient fatal cases.
>
> 4.Single-process multi-threaded stop/continue cases.
>
> 5.Single-process multi-threaded fatal cases.
>
> 6.And on to process-group cases.

Paul, I am not an expert at all, but honestly I don't see how
this could be achieved. This lock is heavily overloaded for
quite different purposes. I think that may be it makes sense
to try other steps, for example (random order):

  1. Tasklist protects ->sighand changing (de_thread) - rework
 sighand access/locking.

  2. Tasklist protects reparenting - fix this.

  ...

  N. PTRACE!!! Well, I close my eyes immediately when I see this
 word in the sources.

Only then we can eliminate tasklist locking from signal sending
path. But I don't see the easy way to solve any of these 1 - N
problems.

Currently I don't see how your patch could be "fixed" for SIGCONT
case, except very ugly:

kill_proc_info(sig)
{
p = find_task_by_pid(pid);
if (sig == SIGCONT)
read_lock(&tasklist_lock);

error = group_send_sig_info(...);
...
}

But there are other problems too.

Look at __group_complete_signal(), it scans p->pids[PIDTYPE_TGID].pid_list
list to find a a suitable thread. What if 'p' does clone(CLONE_THREAD) now?
Let's look at copy_process(), it does attach_pid(p, PIDTYPE_TGID, p->tgid)
under the lovely tasklist_lock again.

So, I don't beleive we can solve even the simplest case (single-recipient,
non fatal, non stop/cont) without significant locking rework.

I hope that your patch will stimulate this work.

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-18 Thread Paul E. McKenney
On Thu, Aug 18, 2005 at 03:48:00PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote:
> > >
> > > Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND,
> > > so we always need to lock one ->sighand. Could you please clarify?
> >
> > On the #3 and #4 code paths, the code assumes that the task-list lock
> > is held.  So I was thinking of something (very roughly) as follows:
> >
> > #define SIGLOCK_HOLD_RCU  (1 << 0)
> > #define SIGLOCK_HOLD_TASKLIST (1 << 1)
> > #define SIGLOCK_HOLD_SIGLOCK  (1 << 2)
> 
> Oh, no, sorry for confusion.
> 
> I meant this function should only lock ->sighand, nothing more, something
> like this:
> 
> // must be called with preemtion disabled !!!
> struct sighand_struct *lock_task_sighand(struct task_struct *tsk, unsigned 
> long *flags)
> {
>   struct sighand_struct *sighand;
> 
> //sighand = NULL;
> //if (!get_task_struct_rcu(tsk))
> //goto out;
> 
>   for (;;) {
>   sighand = tsk->sighand;
>   if (unlikely(sighand == NULL))
>   break;
> 
>   spin_lock_irqsave(sighand->siglock, *flags);
> 
>   if (likely(sighand == tsk->sighand)
>   goto out;
> 
>   spin_unlock_irqrestore(sighand->siglock, *flags);
>   }
> 
> //put_task_struct(tsk);
> out:
>   return sighand;
> }
> 
> static inline void unlock_task_sighand(struct task_struct *tsk, unsigned long 
> *flags)
> {
>   spin_unlock_irqrestore(tsk->sighand->siglock, flags);
> //put_task_struct(tsk);
> }
> 
> int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
> {
>   unsigned long flags;
>   int ret;
> 
>   ret = check_kill_permission(sig, info, p);
>   if (!ret && sig) {
>   ret = -ESRCH;
>   if (lock_task_sighand(p, &flags)) {
>   ret = __group_send_sig_info(sig, info, p);
>   unlock_task_sighand(p, &flags);
>   }
>   }
> 
>   return ret;
> }
> 
> Currently the only user of it will be group_send_sig_info(),

Well, that is one reason that I didn't think that you were talking
about lock_task_sighand() working only on siglock...

>  but I hope
> you have devil plans to kill the "tasklist_lock guards the very rare
> ->sighand change" finally :)

;-)

I have indeed been thinking along these lines, but all of the devil plans
that I have come up thus far are quite intrusive, and might have some
performance problems in some situations.  So it seems best to remove
tasklist_lock in steps:

1.  Single-recipient catch and ignore cases.

2.  Single-recipient stop/continue cases.

3.  Single-recipient fatal cases.

4.  Single-process multi-threaded stop/continue cases.

5.  Single-process multi-threaded fatal cases.

6.  And on to process-group cases.

There are a number of ways of handling this, and they all have some
scary aspects (and here you thought the current patches were scary!!!):

1.  Provide a single signal-reception data structure for each
multi-recipient case: process for multithreaded processes,
process-group leader for process-group signals, controlling
terminal for tty-based signals.  The task delivering the
signal records the signal (perhaps queuing it) on the
signal-reception data structure, and sends an IPI to each
CPU currently running one of the affected tasks.

What about the tasks not currently running?  There are a
number of devil-planlet alternatives possible here:

a.  Have a daemon responsible for waking up any signaled
tasks (assuming that they are blocked or sleeping
interruptibly).

b.  Have the tasks running on the CPUs each wake up 
one sleeping task, who in turn wake up another
once they receive the signal, and so on.

c.  Have tasks check their signal-reception data structures
(might have up to three: process, process group,
controlling terminal).  Hopefully can get by with
a very few such checks, but...

What to do if several signals arrive at about the same time?
What about consistent ordering of signals to different groups
with partially overlapping task membership?  Are we allowed
to see different signal orderings for different tasks sharing
two different groupings?  Devil-planlet alternatives:

a.  Serialize on controlling terminal (but this might as
well be a global lock on single-user systems).

b.  Allow signals to arrive in different orders at different
threads of a process, if the different signals are
targetted at different groups (e.g., proc

Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-18 Thread Oleg Nesterov
(Replying to wrong message, sorry).

Thomas Gleixner wrote:
>
> --- linux-2.6.13-rc6-rt8/kernel/fork.c2005-08-17 12:57:08.0 
> +0200
> +++ linux-2.6.13-rc6-rt/kernel/fork.c 2005-08-17 11:17:46.0 +0200
> @@ -1198,7 +1198,8 @@ bad_fork_cleanup_mm:
>  bad_fork_cleanup_signal:
>   exit_signal(p);
>  bad_fork_cleanup_sighand:
> - exit_sighand(p);
> + if (p->sighand) /* exit_signal() could have freed p->sighand */
> + exit_sighand(p);


Looks like now it is the only user of exit_signal(), and I think
we can kill this function and just do:

bad_fork_cleanup_sighand:

if (p->sighand) {

// p->sighand can't change here, we don't need tasklist lock

if (atomic_dec_and_test(p->sighand->count))

// If we get here we are not sharing ->sighand with 
anybody else.
// It means, in particular, that p had no CLONE_THREAD 
flag.
// Nobody can see this process yet, we didn't call 
attach_pid(),
// otherwise ->sighand was freed from __exit_signal. 
Thus nobody
// can see this sighand.

sighand_free(p->sighand);
}

It is not an optimization, this path is rare, just to make things
more clear and to reduce "false positives" from grep tasklist_lock.

And I think it makes sense to

#define put_sighand(sig)\
do if atomic_dec_and_test(sig->count) sighand_free(sig); while (0)

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-18 Thread Oleg Nesterov
Paul E. McKenney wrote:
>
> On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote:
> >
> > Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND,
> > so we always need to lock one ->sighand. Could you please clarify?
>
> On the #3 and #4 code paths, the code assumes that the task-list lock
> is held.  So I was thinking of something (very roughly) as follows:
>
> #define SIGLOCK_HOLD_RCU  (1 << 0)
> #define SIGLOCK_HOLD_TASKLIST (1 << 1)
> #define SIGLOCK_HOLD_SIGLOCK  (1 << 2)

Oh, no, sorry for confusion.

I meant this function should only lock ->sighand, nothing more, something
like this:

// must be called with preemtion disabled !!!
struct sighand_struct *lock_task_sighand(struct task_struct *tsk, unsigned long 
*flags)
{
struct sighand_struct *sighand;

//  sighand = NULL;
//  if (!get_task_struct_rcu(tsk))
//  goto out;

for (;;) {
sighand = tsk->sighand;
if (unlikely(sighand == NULL))
break;

spin_lock_irqsave(sighand->siglock, *flags);

if (likely(sighand == tsk->sighand)
goto out;

spin_unlock_irqrestore(sighand->siglock, *flags);
}

//  put_task_struct(tsk);
out:
return sighand;
}

static inline void unlock_task_sighand(struct task_struct *tsk, unsigned long 
*flags)
{
spin_unlock_irqrestore(tsk->sighand->siglock, flags);
//  put_task_struct(tsk);
}

int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
unsigned long flags;
int ret;

ret = check_kill_permission(sig, info, p);
if (!ret && sig) {
ret = -ESRCH;
if (lock_task_sighand(p, &flags)) {
ret = __group_send_sig_info(sig, info, p);
unlock_task_sighand(p, &flags);
}
}

return ret;
}

Currently the only user of it will be group_send_sig_info(), but I hope
you have devil plans to kill the "tasklist_lock guards the very rare
->sighand change" finally :)

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-17 Thread Paul E. McKenney
On Wed, Aug 17, 2005 at 06:35:03PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > > The other thing that jumped out at me is that signals are very different
> > > animals from a locking viewpoint depending on whether they are:
> > >
> > > 1.ignored,
> > >
> > > 2.caught by a single thread,
> > >
> > > 3.fatal to multiple threads/processes (though I don't know
> > >   of anything that shares sighand_struct between separate
> > >   processes), or
> > >
> > > 4.otherwise global to multiple threads/processes (such as
> > >   SIGSTOP and SIGCONT).
> > >
> > > And there are probably other distinctions that I have not yet caught
> > > on to.
> > >
> > > One way to approach this would be to make your suggested 
> > > lock_task_sighand()
> > > look at the signal and acquire the appropriate locks.  If, having acquired
> > > a given set of locks, it found that the needed set had changed (e.g., due
> > > to racing exec() or sigaction()), then it drops the locks and retries.
> >
> > OK, for this sort of approach to work, lock_task_sighand() must take and
> > return some sort of mask indicating what locks are held.  The mask returned
> > by lock_task_sighand() would then be passed to an unlock_task_sighand().
> 
> Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND,
> so we always need to lock one ->sighand. Could you please clarify?

On the #3 and #4 code paths, the code assumes that the task-list lock
is held.  So I was thinking of something (very roughly) as follows:

#define SIGLOCK_HOLD_RCU  (1 << 0)
#define SIGLOCK_HOLD_TASKLIST (1 << 1)
#define SIGLOCK_HOLD_SIGLOCK  (1 << 2)

int lock_task_sighand(int sig, struct task_struct *p, int locksheld, struct 
sighand_struct **spp, int *flags)
{
int locksret = 0;
struct sighand_struct *sp;

retry:
if (!(locksheld & SIGLOCK_HOLD_RCU)) {
locksret |= SIGLOCK_HOLD_RCU;
rcu_read_lock();
}
sp = rcu_dereference(p->sighand);
if (sp == NULL) {
unlock_task_sighand(NULL, locksret, *flags);
*spp = NULL;
return 0;
}
if (!(locksheld & SIGLOCK_HOLD_TASKLIST)) {
/* Complain if siglock held. */
if (sig_kernel_stop(sig) /* expand for other conditions */) {
locksret |= SIGLOCK_HOLD_TASKLIST;
read_lock(&tasklist_lock);
}
}
if (!(locksheld & SIGLOCK_HOLD_SIGLOCK)) {
*flags = 0;
} else {
locksret |= SIGLOCK_HOLD_SIGLOCK;
spin_lock_irqsave(&sp->siglock, *flags);
if (p->sighand != sp) {
unlock_task_sighand(sp, locksret, *flags);
goto retry;
}
}
*spp = sp;
return locksret;
}

void unlock_task_sighand(struct sighand_struct *sp, int lockstodrop, int flags)
{
/* lockstodrop had better be non-negative! */

if (lockstodrop & SIGLOCK_HOLD_SIGLOCK) {
spin_unlock_irqrestore(&sp->siglock, flags);
}
if (lockstodrop & SIGLOCK_HOLD_TASKLIST) {
read_unlock(&tasklist_lock);
}
if (lockstodrop & SIGLOCK_HOLD_RCU) {
rcu_read_unlock();
}
}

The "expand for other conditions" must also cover signals that cause
coredumps, that kill all threads in a process, or that otherwise affect
more than one thread (e.g., kill with a negative signal number).  I am
assuming that your argument about not needing the get_task_struct_rcu()
eventually work their way through my skull.  ;-)

Of course, the "expand for other conditions" requires reference to the
sighand_struct.  And for that, you really need to be holding siglock.
But you have to drop siglock to acquire tasklist_lock.  So will end up
relying on RCU some more to safely peek into sighand_struct -before-
acquiring siglock -- which is why I snapshot p->sighand so early, it
will need to be referenced when checking "other conditions".

I am not exactly happy with the above pair of functions, but I suspect
that they might -really- simplify things a bit.

The usage would be as follows:

if ((lockstodrop = lock_task_sighand(sig, tsk, 0, &sp, &flags)) < 0)
return lockstodrop;  /* error code */
if (sp != NULL) {
/* invoke the function to send the signal */
}
unlock_task_sighand(sp, lockstodrop, flags);

Thoughts?  Hey, you asked for this!!!  ;-)

> > > > +   if (!ret && sig && (sp = p->sighand)) {
> > > > if (!get_task_struct_rcu(p)) {
> > > > return -ESRCH;
> > > > }
> > > > -   spin_lock_irqsave(&p->sighand->siglock, flags);
> > > > +   spin_lock_irqsave(&sp->siglock, flags);
> > > > +   if (p->sighand != sp) {
> > > > +   spin_un

Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-17 Thread Oleg Nesterov
Paul E. McKenney wrote:
>
> > The other thing that jumped out at me is that signals are very different
> > animals from a locking viewpoint depending on whether they are:
> >
> > 1.  ignored,
> >
> > 2.  caught by a single thread,
> >
> > 3.  fatal to multiple threads/processes (though I don't know
> > of anything that shares sighand_struct between separate
> > processes), or
> >
> > 4.  otherwise global to multiple threads/processes (such as
> > SIGSTOP and SIGCONT).
> >
> > And there are probably other distinctions that I have not yet caught
> > on to.
> >
> > One way to approach this would be to make your suggested lock_task_sighand()
> > look at the signal and acquire the appropriate locks.  If, having acquired
> > a given set of locks, it found that the needed set had changed (e.g., due
> > to racing exec() or sigaction()), then it drops the locks and retries.
>
> OK, for this sort of approach to work, lock_task_sighand() must take and
> return some sort of mask indicating what locks are held.  The mask returned
> by lock_task_sighand() would then be passed to an unlock_task_sighand().

Sorry, I don't understand you. CLONE_THREAD implies CLONE_SIGHAND,
so we always need to lock one ->sighand. Could you please clarify?

> > > + if (!ret && sig && (sp = p->sighand)) {
> > >   if (!get_task_struct_rcu(p)) {
> > >   return -ESRCH;
> > >   }
> > > - spin_lock_irqsave(&p->sighand->siglock, flags);
> > > + spin_lock_irqsave(&sp->siglock, flags);
> > > + if (p->sighand != sp) {
> > > + spin_unlock_irqrestore(&sp->siglock, flags);
> > > + put_task_struct(p);
> > > + goto retry;
> > > + }
> > >   ret = __group_send_sig_info(sig, info, p);
> > > - spin_unlock_irqrestore(&p->sighand->siglock, flags);
> > > + spin_unlock_irqrestore(&sp->siglock, flags);
> > >   put_task_struct(p);
> >
> > Do we really need get_task_struct_rcu/put_task_struct here?
> >
> > The task_struct can't go away under us, it is rcu protected.
> > When ->sighand is locked, and it is still the same after
> > the re-check, it means that 'p' has not done __exit_signal()
> > yet, so it is safe to send the signal.
> >
> > And if the task has ->usage == 0, it means that it also has
> > ->sighand == NULL, and your code will notice that.
> >
> > No?
>
> Seems plausible.  I got paranoid after seeing the lock dropped in
> handle_stop_signal(), though.

Yes, this is bad and should be fixed, I agree.

But why do you think we need to bump task->usage? It can't make any
difference, afaics. The task_struct can't dissapear, the caller was
converted to use rcu_read_lock() or it holds tasklist_lock.

Nonzero task_struct->usage can't stop do_exit or sys_wait4, it will
only postpone call_rcu(__put_task_struct_cb).

And after we locked ->sighand we have sufficient memory barrier, so
if we read the stale value into 'sp' we will notice that (if you were
worried about this issue).

Am I missed something?

>  void exit_sighand(struct task_struct *tsk)
>  {
>   write_lock_irq(&tasklist_lock);
> - __exit_sighand(tsk);
> + spin_lock(&tsk->sighand->siglock);
> + if (tsk->sighand != NULL) {
> + __exit_sighand(tsk);
> + }
> + spin_unlock(&tsk->sighand->siglock);
>   write_unlock_irq(&tasklist_lock);
>  }

Very strange code. Why this check? And what happens with

spin_unlock(&tsk->sighand->siglock);

when tsk->sighand == NULL ?

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-16 Thread Ingo Molnar

* Paul E. McKenney <[EMAIL PROTECTED]> wrote:

> My tests are not finding even glaring races, so time to go and create 
> some torture tests before getting too much more elaborate.  10,000 
> eyes are nice (and Oleg's eyes do seem to be working especially well), 
> but a good software-test sledgehammer has its uses as well.

i've merged this to the -rt tree, and find below a delta patch relative 
to the previous patch.

Ingo

Index: linux/kernel/exit.c
===
--- linux.orig/kernel/exit.c
+++ linux/kernel/exit.c
@@ -74,7 +74,6 @@ repeat: 
__ptrace_unlink(p);
BUG_ON(!list_empty(&p->ptrace_list) || 
!list_empty(&p->ptrace_children));
__exit_signal(p);
-   __exit_sighand(p);
/*
 * Note that the fastpath in sys_times depends on __exit_signal having
 * updated the counters before a task is removed from the tasklist of
Index: linux/kernel/signal.c
===
--- linux.orig/kernel/signal.c
+++ linux/kernel/signal.c
@@ -328,17 +328,19 @@ void __exit_sighand(struct task_struct *
struct sighand_struct * sighand = tsk->sighand;
 
/* Ok, we're done with the signal handlers */
-   spin_lock(&sighand->siglock);
tsk->sighand = NULL;
if (atomic_dec_and_test(&sighand->count))
sighand_free(sighand);
-   spin_unlock(&sighand->siglock);
 }
 
 void exit_sighand(struct task_struct *tsk)
 {
write_lock_irq(&tasklist_lock);
-   __exit_sighand(tsk);
+   spin_lock(&tsk->sighand->siglock);
+   if (tsk->sighand != NULL) {
+   __exit_sighand(tsk);
+   }
+   spin_unlock(&tsk->sighand->siglock);
write_unlock_irq(&tasklist_lock);
 }
 
@@ -361,6 +363,7 @@ void __exit_signal(struct task_struct *t
if (tsk == sig->curr_target)
sig->curr_target = next_thread(tsk);
tsk->signal = NULL;
+   __exit_sighand(tsk);
spin_unlock(&sighand->siglock);
flush_sigqueue(&sig->shared_pending);
} else {
@@ -392,6 +395,7 @@ void __exit_signal(struct task_struct *t
sig->nvcsw += tsk->nvcsw;
sig->nivcsw += tsk->nivcsw;
sig->sched_time += tsk->sched_time;
+   __exit_sighand(tsk);
spin_unlock(&sighand->siglock);
sig = NULL; /* Marker for below.  */
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-16 Thread Paul E. McKenney
On Tue, Aug 16, 2005 at 10:07:14AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 16, 2005 at 03:56:05PM +0400, Oleg Nesterov wrote:
> > Paul E. McKenney wrote:
> > >
> > > OK, the attached instead revalidates that the task struct still references
> > > the sighand_struct after obtaining the lock
> > 
> > Personally I think this is a way to go. A nitpick suggestion,
> > could you make a separate function (say, lock_task_sighand)
> > which does all this job?
> 
> Sounds good, will do!
> 
> The other thing that jumped out at me is that signals are very different
> animals from a locking viewpoint depending on whether they are:
> 
> 1.ignored,
> 
> 2.caught by a single thread,
> 
> 3.fatal to multiple threads/processes (though I don't know
>   of anything that shares sighand_struct between separate
>   processes), or
> 
> 4.otherwise global to multiple threads/processes (such as
>   SIGSTOP and SIGCONT).
> 
> And there are probably other distinctions that I have not yet caught
> on to.
> 
> One way to approach this would be to make your suggested lock_task_sighand()
> look at the signal and acquire the appropriate locks.  If, having acquired
> a given set of locks, it found that the needed set had changed (e.g., due
> to racing exec() or sigaction()), then it drops the locks and retries.
> 
> Does this make sense?
> 
> This approach assumes that realtime latency (of the kill() operation
> itself) is critical only cases #1 and #2 above.  This makes sense to me,
> but some of you might know of situations where #3 and #4 are important.
> But I am hoping not.  ;-)

OK, for this sort of approach to work, lock_task_sighand() must take and
return some sort of mask indicating what locks are held.  The mask returned
by lock_task_sighand() would then be passed to an unlock_task_sighand().

Another alternative would be to make a macro that allowed an arbitrary
statement to be wrapped up in it.

Neither of these are as appealing as I would like -- other ideas?

I have not yet attacked this in the attached patch.

> > > > and there are some remaining problems
> > > > that I need to sort out, including:
> > > ...
> > >
> > > o Some of the functions invoked by __group_send_sig_info(),
> > >   including handle_stop_signal(), momentarily drop ->siglock.
> > 
> > Just to be sure that one point doesn't escape your attention, this:
> > 
> > > +++ 
> > > linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c  
> > > 2005-08-14 19:53:28.0 -0700
> > > @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct *
> > >   struct sighand_struct * sighand = tsk->sighand;
> > >  
> > >   /* Ok, we're done with the signal handlers */
> > > + spin_lock(&sighand->siglock);
> > >   tsk->sighand = NULL;
> > >   if (atomic_dec_and_test(&sighand->count))
> > > - kmem_cache_free(sighand_cachep, sighand);
> > > + sighand_free(sighand);
> > > + spin_unlock(&sighand->siglock);
> > 
> > is not enough (and unneeded). Unless I missed something, we have
> > a race:
> > 
> > release_task:
> > 
> > __exit_signal:
> > spin_lock(sighand);
> > spin_unlock(sighand);
> > flush_sigqueue(&sig->shared_pending);
> > kmem_cache_free(tsk->signal);
> > // here comes 
> > group_send_sig_info(), locks ->sighand,
> > // delivers the signal 
> > to the ->shared_pending.
> > // siginfo leaked, or 
> > crash.
> > __exit_sighand:
> > spin_lock(sighand);
> > tsk->sighand = NULL;
> > // too late 
> > 
> > I think that release_task() should not use __exit_sighand()
> > at all. Instead, __exit_signal() should set tsk->sighand = NULL
> > under ->sighand->lock.
> 
> Will look into this -- I was inserting the locking to handle a race with
> my revalidation.  It looks like I also need to pay some more attention
> to the race with exiting tasks, good catch!  Your suggestion of invoking
> __exit_signal() from under siglock within __exit_signal() sounds good
> at first glance, will think it through.

This turned out to be easy, the attached patch covers it.

> > >  int group_send_sig_info(int sig, struct siginfo *info, struct 
> > > task_struct *p)
> > >  {
> > >   unsigned long flags;
> > > + struct sighand_struct *sp;
> > >   int ret;
> > >
> > > +retry:
> > >   ret = check_kill_permission(sig, info, p);
> > > - if (!ret && sig && p->sighand) {
> > > + if (!ret && sig && (sp = p->sighand)) {
> > >   if (!get_task_struct_rcu(p)) {
> > >   return -ESRCH;
> > >   }
> > > - spin_lock_irqsave(&p->sighand->siglock, flags);
> > > + spin_lock_irqsave(&sp->siglock, flags);
> > > + if (p->sighand != sp) {
> > > + spin_unlock_irqrestore(&sp->siglock, flags);
> > > + put_task_struct(p);
>

Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-16 Thread Paul E. McKenney
On Tue, Aug 16, 2005 at 03:56:05PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > OK, the attached instead revalidates that the task struct still references
> > the sighand_struct after obtaining the lock
> 
> Personally I think this is a way to go. A nitpick suggestion,
> could you make a separate function (say, lock_task_sighand)
> which does all this job?

Sounds good, will do!

The other thing that jumped out at me is that signals are very different
animals from a locking viewpoint depending on whether they are:

1.  ignored,

2.  caught by a single thread,

3.  fatal to multiple threads/processes (though I don't know
of anything that shares sighand_struct between separate
processes), or

4.  otherwise global to multiple threads/processes (such as
SIGSTOP and SIGCONT).

And there are probably other distinctions that I have not yet caught
on to.

One way to approach this would be to make your suggested lock_task_sighand()
look at the signal and acquire the appropriate locks.  If, having acquired
a given set of locks, it found that the needed set had changed (e.g., due
to racing exec() or sigaction()), then it drops the locks and retries.

Does this make sense?

This approach assumes that realtime latency (of the kill() operation
itself) is critical only cases #1 and #2 above.  This makes sense to me,
but some of you might know of situations where #3 and #4 are important.
But I am hoping not.  ;-)

> > > and there are some remaining problems
> > > that I need to sort out, including:
> > ...
> >
> > o   Some of the functions invoked by __group_send_sig_info(),
> > including handle_stop_signal(), momentarily drop ->siglock.
> 
> Just to be sure that one point doesn't escape your attention, this:
> 
> > +++ 
> > linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c
> > 2005-08-14 19:53:28.0 -0700
> > @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct *
> > struct sighand_struct * sighand = tsk->sighand;
> >  
> > /* Ok, we're done with the signal handlers */
> > +   spin_lock(&sighand->siglock);
> > tsk->sighand = NULL;
> > if (atomic_dec_and_test(&sighand->count))
> > -   kmem_cache_free(sighand_cachep, sighand);
> > +   sighand_free(sighand);
> > +   spin_unlock(&sighand->siglock);
> 
> is not enough (and unneeded). Unless I missed something, we have
> a race:
> 
> release_task:
> 
>   __exit_signal:
>   spin_lock(sighand);
>   spin_unlock(sighand);
>   flush_sigqueue(&sig->shared_pending);
>   kmem_cache_free(tsk->signal);
>   // here comes 
> group_send_sig_info(), locks ->sighand,
>   // delivers the signal 
> to the ->shared_pending.
>   // siginfo leaked, or 
> crash.
>   __exit_sighand:
>   spin_lock(sighand);
>   tsk->sighand = NULL;
>   // too late 
> 
> I think that release_task() should not use __exit_sighand()
> at all. Instead, __exit_signal() should set tsk->sighand = NULL
> under ->sighand->lock.

Will look into this -- I was inserting the locking to handle a race with
my revalidation.  It looks like I also need to pay some more attention
to the race with exiting tasks, good catch!  Your suggestion of invoking
__exit_signal() from under siglock within __exit_signal() sounds good
at first glance, will think it through.

> >  int group_send_sig_info(int sig, struct siginfo *info, struct task_struct 
> > *p)
> >  {
> > unsigned long flags;
> > +   struct sighand_struct *sp;
> > int ret;
> >
> > +retry:
> > ret = check_kill_permission(sig, info, p);
> > -   if (!ret && sig && p->sighand) {
> > +   if (!ret && sig && (sp = p->sighand)) {
> > if (!get_task_struct_rcu(p)) {
> > return -ESRCH;
> > }
> > -   spin_lock_irqsave(&p->sighand->siglock, flags);
> > +   spin_lock_irqsave(&sp->siglock, flags);
> > +   if (p->sighand != sp) {
> > +   spin_unlock_irqrestore(&sp->siglock, flags);
> > +   put_task_struct(p);
> > +   goto retry;
> > +   }
> > ret = __group_send_sig_info(sig, info, p);
> > -   spin_unlock_irqrestore(&p->sighand->siglock, flags);
> > +   spin_unlock_irqrestore(&sp->siglock, flags);
> > put_task_struct(p);
> 
> Do we really need get_task_struct_rcu/put_task_struct here?
> 
> The task_struct can't go away under us, it is rcu protected.
> When ->sighand is locked, and it is still the same after
> the re-check, it means that 'p' has not done __exit_signal()
> yet, so it is safe to send the signal.
> 
> And if the task has ->usage == 0, it means that it also has
> ->sighand == NULL, and your code will notice that.
> 
> No?

Seems plau

Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-16 Thread Oleg Nesterov
Paul E. McKenney wrote:
>
> OK, the attached instead revalidates that the task struct still references
> the sighand_struct after obtaining the lock

Personally I think this is a way to go. A nitpick suggestion,
could you make a separate function (say, lock_task_sighand)
which does all this job?

> > and there are some remaining problems
> > that I need to sort out, including:
> ...
>
> o Some of the functions invoked by __group_send_sig_info(),
>   including handle_stop_signal(), momentarily drop ->siglock.

Just to be sure that one point doesn't escape your attention, this:

> +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c  
> 2005-08-14 19:53:28.0 -0700
> @@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct *
>   struct sighand_struct * sighand = tsk->sighand;
>  
>   /* Ok, we're done with the signal handlers */
> + spin_lock(&sighand->siglock);
>   tsk->sighand = NULL;
>   if (atomic_dec_and_test(&sighand->count))
> - kmem_cache_free(sighand_cachep, sighand);
> + sighand_free(sighand);
> + spin_unlock(&sighand->siglock);

is not enough (and unneeded). Unless I missed something, we have
a race:

release_task:

__exit_signal:
spin_lock(sighand);
spin_unlock(sighand);
flush_sigqueue(&sig->shared_pending);
kmem_cache_free(tsk->signal);
// here comes 
group_send_sig_info(), locks ->sighand,
// delivers the signal 
to the ->shared_pending.
// siginfo leaked, or 
crash.
__exit_sighand:
spin_lock(sighand);
tsk->sighand = NULL;
// too late 

I think that release_task() should not use __exit_sighand()
at all. Instead, __exit_signal() should set tsk->sighand = NULL
under ->sighand->lock.

>  int group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
>  {
>   unsigned long flags;
> + struct sighand_struct *sp;
>   int ret;
>
> +retry:
>   ret = check_kill_permission(sig, info, p);
> - if (!ret && sig && p->sighand) {
> + if (!ret && sig && (sp = p->sighand)) {
>   if (!get_task_struct_rcu(p)) {
>   return -ESRCH;
>   }
> - spin_lock_irqsave(&p->sighand->siglock, flags);
> + spin_lock_irqsave(&sp->siglock, flags);
> + if (p->sighand != sp) {
> + spin_unlock_irqrestore(&sp->siglock, flags);
> + put_task_struct(p);
> + goto retry;
> + }
>   ret = __group_send_sig_info(sig, info, p);
> - spin_unlock_irqrestore(&p->sighand->siglock, flags);
> + spin_unlock_irqrestore(&sp->siglock, flags);
>   put_task_struct(p);

Do we really need get_task_struct_rcu/put_task_struct here?

The task_struct can't go away under us, it is rcu protected.
When ->sighand is locked, and it is still the same after
the re-check, it means that 'p' has not done __exit_signal()
yet, so it is safe to send the signal.

And if the task has ->usage == 0, it means that it also has
->sighand == NULL, and your code will notice that.

No?

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-16 Thread Ingo Molnar

* Paul E. McKenney <[EMAIL PROTECTED]> wrote:

> OK, the attached instead revalidates that the task struct still 
> references the sighand_struct after obtaining the lock (and passes 
> kernbench and LTP, which tells me I need to get better tests!).

i've applied this to the -RT tree, and it's looking good so far from a 
basic stability POV.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-15 Thread Paul E. McKenney
On Fri, Aug 12, 2005 at 12:51:17PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c  2005-08-11 
> > 11:44:55.0 -0700
> > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c  
> > 2005-08-11 12:26:45.0 -0700
> > [ ... snip ... ]
> > @@ -785,11 +787,13 @@ no_thread_group:
> > recalc_sigpending();
> > 
> > +   oldsighand->deleted = 1;
> > +   oldsighand->successor = newsighand;
> 
> I don't think this is correct.
> 
> This ->oldsighand can be shared with another CLONE_SIGHAND
> process and will not be deleted, just unshared.
> 
> When the signal is sent to that process we must use ->oldsighand
> for locking, not the oldsighand->successor == newsighand.

OK, the attached instead revalidates that the task struct still references
the sighand_struct after obtaining the lock (and passes kernbench and
LTP, which tells me I need to get better tests!).  However, your quick
responses motivated me to read more of the code (yes, I know, I should
have done -that- beforehand!!!), and there are some remaining problems
that I need to sort out, including:

o   Fatal signals affect all threads in a process, and the
existing code looks like it assumes that tasklist_lock
is held during delivery of such signals.  In this patch,
that assumption does not hold.

o   Some of the functions invoked by __group_send_sig_info(),
including handle_stop_signal(), momentarily drop ->siglock.

I will be looking at a few approaches to handle these situations, but,
in the meantime, any other combinations that I missed?

Thanx, Paul


diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c  2005-08-11 
11:44:55.0 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c  
2005-08-12 21:18:33.0 -0700
@@ -790,7 +790,7 @@ no_thread_group:
write_unlock_irq(&tasklist_lock);
 
if (atomic_dec_and_test(&oldsighand->count))
-   kmem_cache_free(sighand_cachep, oldsighand);
+   sighand_free(oldsighand);
}
 
BUG_ON(!thread_group_empty(current));
diff -urpNa -X dontdiff 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h  
2005-08-11 11:44:57.0 -0700
+++ 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h  
2005-08-12 21:19:34.0 -0700
@@ -450,8 +450,16 @@ struct sighand_struct {
atomic_tcount;
struct k_sigaction  action[_NSIG];
spinlock_t  siglock;
+   struct rcu_head rcu;
 };
 
+static inline void sighand_free(struct sighand_struct *sp)
+{
+   extern void sighand_free_cb(struct rcu_head *rhp);
+
+   call_rcu(&sp->rcu, sighand_free_cb);
+}
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
diff -urpNa -X dontdiff 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c  2005-08-11 
11:44:57.0 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c  
2005-08-12 21:24:00.0 -0700
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -769,6 +770,14 @@ int unshare_files(void)
 
 EXPORT_SYMBOL(unshare_files);
 
+void sighand_free_cb(struct rcu_head *rhp)
+{
+   struct sighand_struct *sp =
+   container_of(rhp, struct sighand_struct, rcu);
+
+   kmem_cache_free(sighand_cachep, sp);
+}
+
 static inline int copy_sighand(unsigned long clone_flags, struct task_struct * 
tsk)
 {
struct sighand_struct *sig;
diff -urpNa -X dontdiff 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c
2005-08-11 11:44:57.0 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/signal.c
2005-08-14 19:53:28.0 -0700
@@ -328,9 +328,11 @@ void __exit_sighand(struct task_struct *
struct sighand_struct * sighand = tsk->sighand;
 
/* Ok, we're done with the signal handlers */
+   spin_lock(&sighand->siglock);
tsk->sighand = NULL;
if (atomic_dec_and_test(&sighand->count))
-   kmem_cache_free(sighand_cachep, sighand);
+   sighand_free(sighand);
+   spin_unlock(&

Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-12 Thread Paul E. McKenney
On Fri, Aug 12, 2005 at 08:36:00AM +0200, Ingo Molnar wrote:
> 
> * Lee Revell <[EMAIL PROTECTED]> wrote:
> 
> > Doesn't this fix the longest latency we were seeing with 
> > PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal 
> > delivery must remain atomic on !PREEMPT_RT"?
> 
> yes - although Paul's patch converts only a portion of the signal code 
> to RCU-read-lock, so i'd expect there to be other latencies too. Might 
> be worth a test (once we've sorted out the HRT build bugs).

And there remains the question of how much of the benefit remains after
I handle the corner cases...  :-/

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-12 Thread Paul E. McKenney
On Fri, Aug 12, 2005 at 12:51:17PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c  2005-08-11 
> > 11:44:55.0 -0700
> > +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c  
> > 2005-08-11 12:26:45.0 -0700
> > [ ... snip ... ]
> > @@ -785,11 +787,13 @@ no_thread_group:
> > recalc_sigpending();
> > 
> > +   oldsighand->deleted = 1;
> > +   oldsighand->successor = newsighand;
> 
> I don't think this is correct.
> 
> This ->oldsighand can be shared with another CLONE_SIGHAND
> process and will not be deleted, just unshared.
> 
> When the signal is sent to that process we must use ->oldsighand
> for locking, not the oldsighand->successor == newsighand.

Hmmm...  Sounds like I mostly managed to dig myself in deeper here.

Back to the drawing board...

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-12 Thread Oleg Nesterov
Paul E. McKenney wrote:
>
> --- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c2005-08-11 
> 11:44:55.0 -0700
> +++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c
> 2005-08-11 12:26:45.0 -0700
> [ ... snip ... ]
> @@ -785,11 +787,13 @@ no_thread_group:
>   recalc_sigpending();
> 
> + oldsighand->deleted = 1;
> + oldsighand->successor = newsighand;

I don't think this is correct.

This ->oldsighand can be shared with another CLONE_SIGHAND
process and will not be deleted, just unshared.

When the signal is sent to that process we must use ->oldsighand
for locking, not the oldsighand->successor == newsighand.

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Ingo Molnar

* Lee Revell <[EMAIL PROTECTED]> wrote:

> Doesn't this fix the longest latency we were seeing with 
> PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal 
> delivery must remain atomic on !PREEMPT_RT"?

yes - although Paul's patch converts only a portion of the signal code 
to RCU-read-lock, so i'd expect there to be other latencies too. Might 
be worth a test (once we've sorted out the HRT build bugs).

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Lee Revell
On Thu, 2005-08-11 at 11:56 +0200, Ingo Molnar wrote:
> > For the record, some shortcomings of this patch:
> > 
> > o   Needs lots more testing on more architectures.
> > 
> > o   Needs performance and stress testing.
> > 
> > o   Needs testing in Ingo's PREEMPT_RT environment.
> 
> cool patch! I have integrated it into my PREEMPT_RT tree, and all it 
> needed to boot was the patch below (doesnt affect the upstream kernel).  
> Using the raw IRQ flag isnt an issue in the RCU code, all the affected 
> codepaths are small and deterministic.
> 

Ingo,

Doesn't this fix the longest latency we were seeing with
PREEMPT_DESKTOP, I don't have a trace handy but the upshot was "signal
delivery must remain atomic on !PREEMPT_RT"?

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Paul E. McKenney
On Thu, Aug 11, 2005 at 04:16:53PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > --- linux-2.6.13-rc6/kernel/signal.c2005-08-08 19:59:24.0 
> > -0700
> > +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c2005-08-10 
> > 08:20:25.0 -0700
> > @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct 
> >
> > ret = check_kill_permission(sig, info, p);
> > if (!ret && sig && p->sighand) {
> > +   if (!get_task_struct_rcu(p)) {
> > +   return -ESRCH;
> > +   }
> > spin_lock_irqsave(&p->sighand->siglock, flags);
>   ^^^
> Is it correct?

Most definitely not!  Thank you again for catching this one, would have
taken some serious test-and-debug time to root it out the hard way.

Fix provided as a patch against V0.7.53-01, probably still has some
bugs, as it has not yet been thoroughly tested.  General approach is
to RCU-protect the sighand pointer from task_struct to sighand_struct.

Will be testing more thoroughly, in the meantime, thoughts?

Thanx, Paul

 fs/exec.c |6 +-
 include/linux/sched.h |   10 ++
 kernel/fork.c |   11 +++
 kernel/signal.c   |   16 +---
 4 files changed, 39 insertions(+), 4 deletions(-)

diff -urpNa -X dontdiff linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/fs/exec.c  2005-08-11 
11:44:55.0 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/fs/exec.c  
2005-08-11 12:26:45.0 -0700
@@ -773,6 +773,8 @@ no_thread_group:
 */
spin_lock_init(&newsighand->siglock);
atomic_set(&newsighand->count, 1);
+   newsighand->deleted = 0;
+   newsighand->successor = NULL;
memcpy(newsighand->action, oldsighand->action,
   sizeof(newsighand->action));
 
@@ -785,12 +787,14 @@ no_thread_group:
recalc_sigpending();
 
task_unlock(current);
+   oldsighand->deleted = 1;
+   oldsighand->successor = newsighand;
spin_unlock(&newsighand->siglock);
spin_unlock(&oldsighand->siglock);
write_unlock_irq(&tasklist_lock);
 
if (atomic_dec_and_test(&oldsighand->count))
-   kmem_cache_free(sighand_cachep, oldsighand);
+   sighand_free(oldsighand);
}
 
BUG_ON(!thread_group_empty(current));
diff -urpNa -X dontdiff 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/include/linux/sched.h  
2005-08-11 11:44:57.0 -0700
+++ 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/include/linux/sched.h  
2005-08-11 12:17:01.0 -0700
@@ -450,8 +450,18 @@ struct sighand_struct {
atomic_tcount;
struct k_sigaction  action[_NSIG];
spinlock_t  siglock;
+   int deleted;
+   struct sighand_struct   *successor;
+   struct rcu_head rcu;
 };
 
+static inline void sighand_free(struct sighand_struct *sp)
+{
+   extern void sighand_free_cb(struct rcu_head *rhp);
+
+   call_rcu(&sp->rcu, sighand_free_cb);
+}
+
 /*
  * NOTE! "signal_struct" does not have it's own
  * locking, because a shared signal_struct always
diff -urpNa -X dontdiff 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c
--- linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/fork.c  2005-08-11 
11:44:57.0 -0700
+++ linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-tasklistRCU/kernel/fork.c  
2005-08-11 13:05:17.0 -0700
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -769,6 +770,14 @@ int unshare_files(void)
 
 EXPORT_SYMBOL(unshare_files);
 
+void sighand_free_cb(struct rcu_head *rhp)
+{
+   struct sighand_struct *sp =
+   container_of(rhp, struct sighand_struct, rcu);
+
+   kmem_cache_free(sighand_cachep, sp);
+}
+
 static inline int copy_sighand(unsigned long clone_flags, struct task_struct * 
tsk)
 {
struct sighand_struct *sig;
@@ -783,6 +792,8 @@ static inline int copy_sighand(unsigned 
return -ENOMEM;
spin_lock_init(&sig->siglock);
atomic_set(&sig->count, 1);
+   sig->deleted = 0;
+   sig->successor = 0;
memcpy(sig->action, current->sighand->action, sizeof(sig->action));
return 0;
 }
diff -urpNa -X dontdiff 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01/kernel/signal.c 
linux-2.6.13-rc4-realtime-preempt-V0.7.53-01-

Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Dipankar Sarma
On Thu, Aug 11, 2005 at 11:30:44PM +0530, Dipankar Sarma wrote:
> When I worked on this last (a year or so ago), it seemed that I would
> need to put a number of additional structures under RCU control.
> It would be better to gradually move it towards RCU rather than
> trying make all the readers lock-free.

Just for reference, this was my tasks-rcu patch. 2.6.0-test2, no less :)
I was interested in get_pid_list() at that time. IIRC, I tested it with
lots of top running along with other tests.

Thanks
Dipankar


Incremental patch to do lockfree traversal of the task list using
RCU. For now it just does one of the costlies ones in /proc.

Signed-off-by: Dipankar Sarma <[EMAIL PROTECTED]>


 fs/exec.c |1 +
 fs/proc/base.c|5 +++--
 include/linux/sched.h |   21 -
 3 files changed, 20 insertions(+), 7 deletions(-)

diff -puN fs/exec.c~tasks-rcu fs/exec.c
--- linux-2.6.0-test2-ds/fs/exec.c~tasks-rcu2003-08-04 21:48:38.0 
+0530
+++ linux-2.6.0-test2-ds-dipankar/fs/exec.c 2003-08-04 21:49:26.0 
+0530
@@ -676,6 +676,7 @@ static inline int de_thread(struct task_
 
list_del(¤t->tasks);
list_add_tail(¤t->tasks, &init_task.tasks);
+   list_add_tail_rcu(¤t->tasks, &init_task.tasks);
current->exit_signal = SIGCHLD;
state = leader->state;
 
diff -puN fs/proc/base.c~tasks-rcu fs/proc/base.c
--- linux-2.6.0-test2-ds/fs/proc/base.c~tasks-rcu   2003-08-04 
21:48:38.0 +0530
+++ linux-2.6.0-test2-ds-dipankar/fs/proc/base.c2003-08-04 
21:49:59.0 +0530
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * For hysterical raisins we keep the same inumbers as in the old procfs.
@@ -1403,7 +1404,7 @@ static int get_pid_list(int index, unsig
int nr_pids = 0;
 
index--;
-   read_lock(&tasklist_lock);
+   rcu_read_lock();
for_each_process(p) {
int pid = p->pid;
if (!pid_alive(p))
@@ -1415,7 +1416,7 @@ static int get_pid_list(int index, unsig
if (nr_pids >= PROC_MAXPIDS)
break;
}
-   read_unlock(&tasklist_lock);
+   rcu_read_unlock();
return nr_pids;
 }
 
diff -puN include/linux/sched.h~tasks-rcu include/linux/sched.h
--- linux-2.6.0-test2-ds/include/linux/sched.h~tasks-rcu2003-08-04 
21:48:38.0 +0530
+++ linux-2.6.0-test2-ds-dipankar/include/linux/sched.h 2003-08-04 
21:54:58.0 +0530
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct exec_domain;
 
@@ -456,13 +457,23 @@ struct task_struct {
struct io_context *io_context;
 
unsigned long ptrace_message;
+   struct rcu_head rcu;
siginfo_t *last_siginfo; /* For ptrace use.  */
 };
 
 extern void __put_task_struct(struct task_struct *tsk);
 #define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
-#define put_task_struct(tsk) \
-do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0)
+static void put_task_struct_rcu(struct rcu_head *rcu)
+{
+   struct task_struct *tsk = container_of(rcu, struct task_struct, rcu);
+   __put_task_struct(tsk);
+}
+static inline void put_task_struct(struct task_struct *tsk)
+{
+   if (atomic_dec_and_test(&tsk->usage))
+   call_rcu(&tsk->rcu, put_task_struct_rcu);
+}
+
 
 /*
  * Per process flags
@@ -675,13 +686,13 @@ extern void wait_task_inactive(task_t * 
 
 #define REMOVE_LINKS(p) do {   \
if (thread_group_leader(p)) \
-   list_del_init(&(p)->tasks); \
+   list_del_rcu(&(p)->tasks);  \
remove_parent(p);   \
} while (0)
 
 #define SET_LINKS(p) do {  \
if (thread_group_leader(p)) \
-   list_add_tail(&(p)->tasks,&init_task.tasks);\
+   list_add_tail_rcu(&(p)->tasks,&init_task.tasks);\
add_parent(p, (p)->parent); \
} while (0)
 
@@ -689,7 +700,7 @@ extern void wait_task_inactive(task_t * 
 #define prev_task(p)   list_entry((p)->tasks.prev, struct task_struct, tasks)
 
 #define for_each_process(p) \
-   for (p = &init_task ; (p = next_task(p)) != &init_task ; )
+   for (p = &init_task ; (p = next_task(p)),({ read_barrier_depends(); 
0;}),p != &init_task ; )
 
 /*
  * Careful: do_each_thread/while_each_thread is a double loop so

_
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Dipankar Sarma
On Thu, Aug 11, 2005 at 06:14:51PM +0100, Christoph Hellwig wrote:
> On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This patch is an experiment in use of RCU for individual code paths that
> > read-acquire the tasklist lock, in this case, unicast signal delivery.
> > It passes five kernbenches on 4-CPU x86, but obviously needs much more
> > testing before it is considered for serious use, let alone inclusion.
> 
> I think we should switch over tasklist_lock to RCU completely instead of
> adding suck hacks.  I've started lots of preparation work to get rid of
> tasklist_lock users outside of kernel/, especialy getting rid of any
> use in modules.

That would be really helpful, specially the ones in drivers/char/tty*.c :)

When I worked on this last (a year or so ago), it seemed that I would
need to put a number of additional structures under RCU control.
It would be better to gradually move it towards RCU rather than
trying make all the readers lock-free.

Thanks
Dipankar
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Paul E. McKenney
On Thu, Aug 11, 2005 at 06:14:51PM +0100, Christoph Hellwig wrote:
> On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This patch is an experiment in use of RCU for individual code paths that
> > read-acquire the tasklist lock, in this case, unicast signal delivery.
> > It passes five kernbenches on 4-CPU x86, but obviously needs much more
> > testing before it is considered for serious use, let alone inclusion.
> 
> I think we should switch over tasklist_lock to RCU completely instead of
> adding suck hacks.  I've started lots of preparation work to get rid of
> tasklist_lock users outside of kernel/, especialy getting rid of any
> use in modules.

I agree fully with your end goal -- I would certainly look funny
disagreeing, right?  ;-)  And Oleg pointed out an area I missed in my
patch, am fixing, but in the meantime it most certainly does qualify as
a suck hack.  :-/  Working on it...

But one of the cool things about RCU is that we can make this change
incrementally, with a series of small patches.  We can have some of
the read paths protected by RCU and others continuing to be protected
by read_lock(), so that we can avoid the need for a "flag day" that
hits all 300+ uses of tasklist_lock.

FWIW, the approach that I am taking to fix the bug Oleg and Christoph
spotted is roughly as follows:

o   Add "struct rcu_head rcu", "struct sighand_struct *successor",
and "int deleted" to struct sighand_struct.  This allows
reliable signal delivery in face of sighand_struct replacements.
If an RCU reader finds (deleted && !successor), that reader
is trying to signal a dying process, so gives up.  If an
RCU reader instead finds (deleted && successor), the reader
traverses the successor pointer to find the current version.

o   Apply call_rcu() to deletion of struct sighand_struct, with
the exception of a couple of failure paths in exec.c, where
the sighand_struct was never exposed to RCU readers.

Thoughts?

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Christoph Hellwig
On Wed, Aug 10, 2005 at 10:11:45AM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This patch is an experiment in use of RCU for individual code paths that
> read-acquire the tasklist lock, in this case, unicast signal delivery.
> It passes five kernbenches on 4-CPU x86, but obviously needs much more
> testing before it is considered for serious use, let alone inclusion.

I think we should switch over tasklist_lock to RCU completely instead of
adding suck hacks.  I've started lots of preparation work to get rid of
tasklist_lock users outside of kernel/, especialy getting rid of any
use in modules.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Paul E. McKenney
On Thu, Aug 11, 2005 at 04:16:53PM +0400, Oleg Nesterov wrote:
> Paul E. McKenney wrote:
> >
> > --- linux-2.6.13-rc6/kernel/signal.c2005-08-08 19:59:24.0 
> > -0700
> > +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c2005-08-10 
> > 08:20:25.0 -0700
> > @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct 
> >
> > ret = check_kill_permission(sig, info, p);
> > if (!ret && sig && p->sighand) {
> > +   if (!get_task_struct_rcu(p)) {
> > +   return -ESRCH;
> > +   }
> > spin_lock_irqsave(&p->sighand->siglock, flags);
>   ^^^
> Is it correct?
> 
> The caller (kill_proc_info) does not take tasklist_lock anymore.
> If p does exec() at this time it can change/free its ->sighand.
> 
> fs/exec.c:de_thread()
>773  write_lock_irq(&tasklist_lock);
>774  spin_lock(&oldsighand->siglock);
>775  spin_lock(&newsighand->siglock);
>776
>777  current->sighand = newsighand;
>778  recalc_sigpending();
>779
>780  spin_unlock(&newsighand->siglock);
>781  spin_unlock(&oldsighand->siglock);
>782  write_unlock_irq(&tasklist_lock);
>783
>784  if (atomic_dec_and_test(&oldsighand->count))
>785  kmem_cache_free(sighand_cachep, oldsighand);

Looks suspicious to me!  ;-)  Will look into this one, thank you for
pointing it out!

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Paul E. McKenney
On Thu, Aug 11, 2005 at 11:56:34AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <[EMAIL PROTECTED]> wrote:
> 
> > Hello!
> > 
> > This patch is an experiment in use of RCU for individual code paths 
> > that read-acquire the tasklist lock, in this case, unicast signal 
> > delivery. It passes five kernbenches on 4-CPU x86, but obviously needs 
> > much more testing before it is considered for serious use, let alone 
> > inclusion.
> > 
> > My main question is whether I have the POSIX semantics covered.  I 
> > believe that I do, but thought I should check with people who are more 
> > familiar with POSIX than am I.
> > 
> > For the record, some shortcomings of this patch:
> > 
> > o   Needs lots more testing on more architectures.
> > 
> > o   Needs performance and stress testing.
> > 
> > o   Needs testing in Ingo's PREEMPT_RT environment.
> 
> cool patch! I have integrated it into my PREEMPT_RT tree, and all it 
> needed to boot was the patch below (doesnt affect the upstream kernel).  
> Using the raw IRQ flag isnt an issue in the RCU code, all the affected 
> codepaths are small and deterministic.
> 
> (without this patch it locked up after detecting IRQ7 - not sure why.)

Without this patch on an older version of PREEMPT_RT (V0.7.52-12),
it would boot, pass kernbench, but fail LTP.  Passed both on a stock
kernel.

Will re-run with your patch.  ;-)

> kernel still works fine after some (mostly light) testing.

Cool!  Next step for me is to run some focussed stress tests.

Thanx, Paul

>   Ingo
> 
> Index: linux/kernel/rcupdate.c
> ===
> --- linux.orig/kernel/rcupdate.c
> +++ linux/kernel/rcupdate.c
> @@ -134,11 +134,11 @@ void fastcall call_rcu(struct rcu_head *
>  
>   head->func = func;
>   head->next = NULL;
> - local_irq_save(flags);
> + raw_local_irq_save(flags);
>   rdp = &__get_cpu_var(rcu_data);
>   *rdp->nxttail = head;
>   rdp->nxttail = &head->next;
> - local_irq_restore(flags);
> + raw_local_irq_restore(flags);
>  }
>  
>  /**
> @@ -165,11 +165,11 @@ void fastcall call_rcu_bh(struct rcu_hea
>  
>   head->func = func;
>   head->next = NULL;
> - local_irq_save(flags);
> + raw_local_irq_save(flags);
>   rdp = &__get_cpu_var(rcu_bh_data);
>   *rdp->nxttail = head;
>   rdp->nxttail = &head->next;
> - local_irq_restore(flags);
> + raw_local_irq_restore(flags);
>  }
>  
>  /*
> @@ -305,11 +305,11 @@ static void rcu_check_quiescent_state(st
>  static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list,
>   struct rcu_head **tail)
>  {
> - local_irq_disable();
> + raw_local_irq_disable();
>   *this_rdp->nxttail = list;
>   if (list)
>   this_rdp->nxttail = tail;
> - local_irq_enable();
> + raw_local_irq_enable();
>  }
>  
>  static void __rcu_offline_cpu(struct rcu_data *this_rdp,
> @@ -362,13 +362,13 @@ static void __rcu_process_callbacks(stru
>   rdp->curtail = &rdp->curlist;
>   }
>  
> - local_irq_disable();
> + raw_local_irq_disable();
>   if (rdp->nxtlist && !rdp->curlist) {
>   rdp->curlist = rdp->nxtlist;
>   rdp->curtail = rdp->nxttail;
>   rdp->nxtlist = NULL;
>   rdp->nxttail = &rdp->nxtlist;
> - local_irq_enable();
> + raw_local_irq_enable();
>  
>   /*
>* start the next batch of callbacks
> @@ -388,7 +388,7 @@ static void __rcu_process_callbacks(stru
>   spin_unlock(&rsp->lock);
>   }
>   } else {
> - local_irq_enable();
> + raw_local_irq_enable();
>   }
>   rcu_check_quiescent_state(rcp, rsp, rdp);
>   if (rdp->donelist)
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Oleg Nesterov
Paul E. McKenney wrote:
>
> --- linux-2.6.13-rc6/kernel/signal.c  2005-08-08 19:59:24.0 -0700
> +++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c  2005-08-10 
> 08:20:25.0 -0700
> @@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct 
>
>   ret = check_kill_permission(sig, info, p);
>   if (!ret && sig && p->sighand) {
> + if (!get_task_struct_rcu(p)) {
> + return -ESRCH;
> + }
>   spin_lock_irqsave(&p->sighand->siglock, flags);
  ^^^
Is it correct?

The caller (kill_proc_info) does not take tasklist_lock anymore.
If p does exec() at this time it can change/free its ->sighand.

fs/exec.c:de_thread()
   773  write_lock_irq(&tasklist_lock);
   774  spin_lock(&oldsighand->siglock);
   775  spin_lock(&newsighand->siglock);
   776
   777  current->sighand = newsighand;
   778  recalc_sigpending();
   779
   780  spin_unlock(&newsighand->siglock);
   781  spin_unlock(&oldsighand->siglock);
   782  write_unlock_irq(&tasklist_lock);
   783
   784  if (atomic_dec_and_test(&oldsighand->count))
   785  kmem_cache_free(sighand_cachep, oldsighand);

Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-11 Thread Ingo Molnar

* Paul E. McKenney <[EMAIL PROTECTED]> wrote:

> Hello!
> 
> This patch is an experiment in use of RCU for individual code paths 
> that read-acquire the tasklist lock, in this case, unicast signal 
> delivery. It passes five kernbenches on 4-CPU x86, but obviously needs 
> much more testing before it is considered for serious use, let alone 
> inclusion.
> 
> My main question is whether I have the POSIX semantics covered.  I 
> believe that I do, but thought I should check with people who are more 
> familiar with POSIX than am I.
> 
> For the record, some shortcomings of this patch:
> 
> o Needs lots more testing on more architectures.
> 
> o Needs performance and stress testing.
> 
> o Needs testing in Ingo's PREEMPT_RT environment.

cool patch! I have integrated it into my PREEMPT_RT tree, and all it 
needed to boot was the patch below (doesnt affect the upstream kernel).  
Using the raw IRQ flag isnt an issue in the RCU code, all the affected 
codepaths are small and deterministic.

(without this patch it locked up after detecting IRQ7 - not sure why.)

kernel still works fine after some (mostly light) testing.

Ingo

Index: linux/kernel/rcupdate.c
===
--- linux.orig/kernel/rcupdate.c
+++ linux/kernel/rcupdate.c
@@ -134,11 +134,11 @@ void fastcall call_rcu(struct rcu_head *
 
head->func = func;
head->next = NULL;
-   local_irq_save(flags);
+   raw_local_irq_save(flags);
rdp = &__get_cpu_var(rcu_data);
*rdp->nxttail = head;
rdp->nxttail = &head->next;
-   local_irq_restore(flags);
+   raw_local_irq_restore(flags);
 }
 
 /**
@@ -165,11 +165,11 @@ void fastcall call_rcu_bh(struct rcu_hea
 
head->func = func;
head->next = NULL;
-   local_irq_save(flags);
+   raw_local_irq_save(flags);
rdp = &__get_cpu_var(rcu_bh_data);
*rdp->nxttail = head;
rdp->nxttail = &head->next;
-   local_irq_restore(flags);
+   raw_local_irq_restore(flags);
 }
 
 /*
@@ -305,11 +305,11 @@ static void rcu_check_quiescent_state(st
 static void rcu_move_batch(struct rcu_data *this_rdp, struct rcu_head *list,
struct rcu_head **tail)
 {
-   local_irq_disable();
+   raw_local_irq_disable();
*this_rdp->nxttail = list;
if (list)
this_rdp->nxttail = tail;
-   local_irq_enable();
+   raw_local_irq_enable();
 }
 
 static void __rcu_offline_cpu(struct rcu_data *this_rdp,
@@ -362,13 +362,13 @@ static void __rcu_process_callbacks(stru
rdp->curtail = &rdp->curlist;
}
 
-   local_irq_disable();
+   raw_local_irq_disable();
if (rdp->nxtlist && !rdp->curlist) {
rdp->curlist = rdp->nxtlist;
rdp->curtail = rdp->nxttail;
rdp->nxtlist = NULL;
rdp->nxttail = &rdp->nxtlist;
-   local_irq_enable();
+   raw_local_irq_enable();
 
/*
 * start the next batch of callbacks
@@ -388,7 +388,7 @@ static void __rcu_process_callbacks(stru
spin_unlock(&rsp->lock);
}
} else {
-   local_irq_enable();
+   raw_local_irq_enable();
}
rcu_check_quiescent_state(rcp, rsp, rdp);
if (rdp->donelist)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC,PATCH] Use RCU to protect tasklist for unicast signals

2005-08-10 Thread Paul E. McKenney
Hello!

This patch is an experiment in use of RCU for individual code paths that
read-acquire the tasklist lock, in this case, unicast signal delivery.
It passes five kernbenches on 4-CPU x86, but obviously needs much more
testing before it is considered for serious use, let alone inclusion.

My main question is whether I have the POSIX semantics covered.  I believe
that I do, but thought I should check with people who are more familiar
with POSIX than am I.

For the record, some shortcomings of this patch:

o   Needs lots more testing on more architectures.

o   Needs performance and stress testing.

o   Needs testing in Ingo's PREEMPT_RT environment.

o   Uses cmpxchg(), which is currently architecture dependent.
This can be fixed, for example, by using the hashed locks
proposed in an earlier patch from Dipankar:

http://marc.theaimsgroup.com/?l=linux-kernel&m=111875978502912&w=2

Thoughts?

Thanx, Paul

---

Not-signed-off-by: [EMAIL PROTECTED]

 include/linux/sched.h |   27 +--
 kernel/sched.c|5 +
 kernel/signal.c   |8 ++--
 3 files changed, 36 insertions(+), 4 deletions(-)

diff -urpN -X dontdiff linux-2.6.13-rc6/include/linux/sched.h 
linux-2.6.13-rc6-tasklistRCU/include/linux/sched.h
--- linux-2.6.13-rc6/include/linux/sched.h  2005-08-08 19:59:23.0 
-0700
+++ linux-2.6.13-rc6-tasklistRCU/include/linux/sched.h  2005-08-09 
15:44:48.0 -0700
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct exec_domain;
 
@@ -770,6 +771,7 @@ struct task_struct {
int cpuset_mems_generation;
 #endif
atomic_t fs_excl;   /* holding fs exclusive resources */
+   struct rcu_head rcu;
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
@@ -793,8 +795,29 @@ static inline int pid_alive(struct task_
 extern void free_task(struct task_struct *tsk);
 extern void __put_task_struct(struct task_struct *tsk);
 #define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
-#define put_task_struct(tsk) \
-do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0)
+
+static inline int get_task_struct_rcu(struct task_struct *t)
+{
+   int oldusage;
+
+   do {
+   oldusage = atomic_read(&t->usage);
+   if (oldusage == 0) {
+   return 0;
+   }
+   } while (cmpxchg(&t->usage.counter,
+oldusage, oldusage + 1) != oldusage);
+   return 1;
+}
+
+extern void __put_task_struct_cb(struct rcu_head *rhp);
+
+static inline void put_task_struct(struct task_struct *t)
+{
+   if (atomic_dec_and_test(&t->usage)) {
+   call_rcu(&t->rcu, __put_task_struct_cb);
+   }
+}
 
 /*
  * Per process flags
diff -urpN -X dontdiff linux-2.6.13-rc6/kernel/sched.c 
linux-2.6.13-rc6-tasklistRCU/kernel/sched.c
--- linux-2.6.13-rc6/kernel/sched.c 2005-08-08 19:59:24.0 -0700
+++ linux-2.6.13-rc6-tasklistRCU/kernel/sched.c 2005-08-09 12:27:34.0 
-0700
@@ -176,6 +176,11 @@ static unsigned int task_timeslice(task_
 #define task_hot(p, now, sd) ((long long) ((now) - (p)->last_ran)  \
< (long long) (sd)->cache_hot_time)
 
+void __put_task_struct_cb(struct rcu_head *rhp)
+{
+   __put_task_struct(container_of(rhp, struct task_struct, rcu));
+}
+
 /*
  * These are the runqueue data structures:
  */
diff -urpN -X dontdiff linux-2.6.13-rc6/kernel/signal.c 
linux-2.6.13-rc6-tasklistRCU/kernel/signal.c
--- linux-2.6.13-rc6/kernel/signal.c2005-08-08 19:59:24.0 -0700
+++ linux-2.6.13-rc6-tasklistRCU/kernel/signal.c2005-08-10 
08:20:25.0 -0700
@@ -1151,9 +1151,13 @@ int group_send_sig_info(int sig, struct 
 
ret = check_kill_permission(sig, info, p);
if (!ret && sig && p->sighand) {
+   if (!get_task_struct_rcu(p)) {
+   return -ESRCH;
+   }
spin_lock_irqsave(&p->sighand->siglock, flags);
ret = __group_send_sig_info(sig, info, p);
spin_unlock_irqrestore(&p->sighand->siglock, flags);
+   put_task_struct(p);
}
 
return ret;
@@ -1200,12 +1204,12 @@ kill_proc_info(int sig, struct siginfo *
int error;
struct task_struct *p;
 
-   read_lock(&tasklist_lock);
+   rcu_read_lock();
p = find_task_by_pid(pid);
error = -ESRCH;
if (p)
error = group_send_sig_info(sig, info, p);
-   read_unlock(&tasklist_lock);
+   rcu_read_unlock();
return error;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/