Oleg Nesterov <[EMAIL PROTECTED]> writes:

> On 12/09, Eric W. Biederman wrote:
>>
>> Equally messed up is a our status in /proc at that point.  Which
>> says our sleeping process is a zombie.
>
> Yes, this is annoying.
>
>> I'm thinking we need to do at least some of the thread group leadership
>> transfer in do_exit, instead of de_thread.  Then p->group_leader->exit_state
>> would be sufficient to see if the entire thread group was alive,
>> as the group_leader would be whoever was left alive.  The original
>> group_leader might still need to be kept around for it's pid...
>>
>> I think that would solve most of the problems you have with a dead
>> thread group leader and sending SIG_STOP as well.
>
> Yes I was thinking about that too, but I am not brave enough to even
> try to to think to the end ;)
>
> As a minimal change, I tried to add "task_struct *leader_proxy" to
> signal_struct, which points to the next live thread, and changed by
> exit_notify(). eligible_child() checks it instead of ->exit_signal.
> But this is so messy...
>
> And in fact, if we are talking about group stop, it is a group operation,
> why do_wait() uses per-thread ->exit_code but not ->group_exit_code ?
Good question, we would need a fallback for the case it isn't a group
operation like in exit but that might clean something up.

> But yes, [PATCH 3/3] adds a visible difference, and I don't know if
> this difference is good or bad.
>
>       $ sleep 1000
>
>       [1]+  Stopped                 sleep 1000
>       $ strace -p `pidof sleep`
>       Process 432 attached - interrupt to quit
>
> Now strace "hangs" in do_wait() because ->exit_code was eaten by the
> shell. We need SIGCONT.
>
> With the "[PATCH 3/3]" strace proceeds happily.
>
> Oleg.

Well I got to playing with the idea of actually moving group_leader
and it turns out that while it is a pain it isn't actually that bad.
The worst part is not really changing the pid of the leader to the pid
of the entire thread group.  As there are a few cases where we are
current referencing the task_pid where we really want task_tgid.

Oleg below is my proof of concept patch, which really needs to be
broken up into a whole patch series, so the changes are small
enough we can do a thorough audit on them.  Anyway take a look
and see what you think.

This patch does fix your weird test case without any actual
change to the do_wait logic itself.

The key idea is that by not making PIDTYPE_PID a hash chain we
can point two different struct pids at the same process allowing
two different pids to return the same process from pid_task(pid,
PIDTYPE_PID);

Which means most things continue to just work by working on
PIDTYPE_PID, although as mentioned previously there are a few
things particulary do_notify_parent_cldstop and do_wait that
need to be changed to return the tgid instead of the pid.

Oh and in eligible child the PIDTYPE_PID test is now sneaky
essentially doing a task lookup and seeing if the result
is our target pid, instead of comparing pids.

The funny part is grep pid /proc/<tgid>/status no longer
always equals the tgid after the pid exits.  Still that seems
better then making the entire thread group look like a zombie
just because the wrong thread exited.

Subject: [PATCH] All thread group leaders to exit

---
 fs/exec.c                 |   81 ++---------------------
 fs/fcntl.c                |   20 ++++--
 fs/proc/base.c            |    6 +-
 include/linux/init_task.h |   25 ++++----
 include/linux/pid.h       |   14 ++---
 include/linux/sched.h     |   43 ++++++------
 kernel/exit.c             |  157 +++++++++++++++++++++++++++-----------------
 kernel/fork.c             |    2 +-
 kernel/itimer.c           |    2 +-
 kernel/pid.c              |   60 +++++++++++------
 kernel/signal.c           |   23 ++-----
 11 files changed, 204 insertions(+), 229 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 14a690d..1f69326 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -786,22 +786,6 @@ static int de_thread(struct task_struct *tsk)
         * Account for the thread group leader hanging around:
         */
        count = 1;
-       if (!thread_group_leader(tsk)) {
-               count = 2;
-               /*
-                * The SIGALRM timer survives the exec, but needs to point
-                * at us as the new group leader now.  We have a race with
-                * a timer firing now getting the old leader, so we need to
-                * synchronize with any firing (by calling del_timer_sync)
-                * before we can safely let the old group leader die.
-                */
-               sig->tsk = tsk;
-               spin_unlock_irq(lock);
-               if (hrtimer_cancel(&sig->real_timer))
-                       hrtimer_restart(&sig->real_timer);
-               spin_lock_irq(lock);
-       }
-
        sig->notify_count = count;
        while (atomic_read(&sig->count) > count) {
                __set_current_state(TASK_UNINTERRUPTIBLE);
@@ -811,68 +795,15 @@ static int de_thread(struct task_struct *tsk)
        }
        spin_unlock_irq(lock);
 
-       /*
-        * At this point all other threads have exited, all we have to
-        * do is to wait for the thread group leader to become inactive,
-        * and to assume its PID:
-        */
-       if (!thread_group_leader(tsk)) {
-               leader = tsk->group_leader;
-
-               sig->notify_count = -1;
-               for (;;) {
-                       write_lock_irq(&tasklist_lock);
-                       if (likely(leader->exit_state))
-                               break;
-                       __set_current_state(TASK_UNINTERRUPTIBLE);
-                       write_unlock_irq(&tasklist_lock);
-                       schedule();
+       /* If it isn't already force gettid() == getpid() */
+       if (sig->tgid != tsk->tid) {
+               write_lock_irq(&tasklist_lock);
+               if (sig->tgid != tsk->tid) {
+                       detach_pid(tsk, PIDTYPE_PID);
+                       attach_pid(tsk, PIDTYPE_PID, sig->tgid);
                }
-
-               /*
-                * The only record we have of the real-time age of a
-                * process, regardless of execs it's done, is start_time.
-                * All the past CPU time is accumulated in signal_struct
-                * from sister threads now dead.  But in this non-leader
-                * exec, nothing survives from the original leader thread,
-                * whose birth marks the true age of this process now.
-                * When we take on its identity by switching to its PID, we
-                * also take its birthdate (always earlier than our own).
-                */
-               tsk->start_time = leader->start_time;
-
-               BUG_ON(!same_thread_group(leader, tsk));
-               BUG_ON(has_group_leader_pid(tsk));
-               /*
-                * An exec() starts a new thread group with the
-                * TGID of the previous thread group. Rehash the
-                * two threads with a switched PID, and release
-                * the former thread group leader:
-                */
-
-               /* Become a process group leader with the old leader's pid.
-                * The old leader becomes a thread of the this thread group.
-                * Note: The old leader also uses this pid until release_task
-                *       is called.  Odd but simple and correct.
-                */
-               detach_pid(tsk, PIDTYPE_PID);
-               tsk->pid = leader->pid;
-               attach_pid(tsk, PIDTYPE_PID,  task_pid(leader));
-               transfer_pid(leader, tsk, PIDTYPE_PGID);
-               transfer_pid(leader, tsk, PIDTYPE_SID);
-               list_replace_rcu(&leader->tasks, &tsk->tasks);
-
-               tsk->group_leader = tsk;
-               leader->group_leader = tsk;
-
-               tsk->exit_signal = SIGCHLD;
-
-               BUG_ON(leader->exit_state != EXIT_ZOMBIE);
-               leader->exit_state = EXIT_DEAD;
-
                write_unlock_irq(&tasklist_lock);
        }
-
        sig->group_exit_task = NULL;
        sig->notify_count = 0;
 
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 8685263..bc0a125 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -516,9 +516,13 @@ void send_sigio(struct fown_struct *fown, int fd, int band)
                goto out_unlock_fown;
        
        read_lock(&tasklist_lock);
-       do_each_pid_task(pid, type, p) {
-               send_sigio_to_task(p, fown, fd, band);
-       } while_each_pid_task(pid, type, p);
+       if (type == PIDTYPE_PID)
+               send_sigio_to_task(pid_task(pid, type), fown, fd, band);
+       else {
+               do_each_pid_task(pid, type, p) {
+                       send_sigio_to_task(p, fown, fd, band);
+               } while_each_pid_task(pid, type, p);
+       }
        read_unlock(&tasklist_lock);
  out_unlock_fown:
        read_unlock(&fown->lock);
@@ -547,9 +551,13 @@ int send_sigurg(struct fown_struct *fown)
        ret = 1;
        
        read_lock(&tasklist_lock);
-       do_each_pid_task(pid, type, p) {
-               send_sigurg_to_task(p, fown);
-       } while_each_pid_task(pid, type, p);
+       if (type == PIDTYPE_PID)
+               send_sigurg_to_task(pid_task(pid, type), fown);
+       else {
+               do_each_pid_task(pid, type, p) {
+                       send_sigurg_to_task(p, fown);
+               } while_each_pid_task(pid, type, p);
+       }
        read_unlock(&tasklist_lock);
  out_unlock_fown:
        read_unlock(&fown->lock);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d59708e..f7bd620 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2438,15 +2438,15 @@ retry:
                 * pid of a thread_group_leader.  Testing for task
                 * being a thread_group_leader is the obvious thing
                 * todo but there is a window when it fails, due to
-                * the pid transfer logic in de_thread.
+                * the pid transfer logic at group leader death.
                 *
                 * So we perform the straight forward test of seeing
-                * if the pid we have found is the pid of a thread
+                * if the pid we have found is the pid of the thread
                 * group leader, and don't worry if the task we have
                 * found doesn't happen to be a thread group leader.
                 * As we don't care in the case of readdir.
                 */
-               if (!iter.task || !has_group_leader_pid(iter.task)) {
+               if (!iter.task || pid != task_tgid(iter.task)) {
                        iter.tgid += 1;
                        goto retry;
                }
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 96be7d6..ddcd7c1 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -67,6 +67,9 @@
        .posix_timers    = LIST_HEAD_INIT(sig.posix_timers),            \
        .cpu_timers     = INIT_CPU_TIMERS(sig.cpu_timers),              \
        .rlim           = INIT_RLIMITS,                                 \
+       .tgid           = &init_struct_pid,                             \
+       .pids[PIDTYPE_PGID]     = &init_struct_pid,                     \
+       .pids[PIDTYPE_SID]      = &init_struct_pid,                     \
 }
 
 extern struct nsproxy init_nsproxy;
@@ -91,10 +94,10 @@ extern struct group_info init_groups;
 
 #define INIT_STRUCT_PID {                                              \
        .count          = ATOMIC_INIT(1),                               \
+       .tsk            = &init_task,                                   \
        .tasks          = {                                             \
-               { .first = &init_task.pids[PIDTYPE_PID].node },         \
-               { .first = &init_task.pids[PIDTYPE_PGID].node },        \
-               { .first = &init_task.pids[PIDTYPE_SID].node },         \
+               { .first = &init_task.pids[PIDTYPE_PGID] },             \
+               { .first = &init_task.pids[PIDTYPE_SID] },              \
        },                                                              \
        .rcu            = RCU_HEAD_INIT,                                \
        .level          = 0,                                            \
@@ -105,13 +108,10 @@ extern struct group_info init_groups;
        }, }                                                            \
 }
 
-#define INIT_PID_LINK(type)                                    \
-{                                                              \
-       .node = {                                               \
-               .next = NULL,                                   \
-               .pprev = &init_struct_pid.tasks[type].first,    \
-       },                                                      \
-       .pid = &init_struct_pid,                                \
+#define INIT_PID_HLIST_NODE(type)                      \
+{                                                      \
+       .next = NULL,                                   \
+       .pprev = &init_struct_pid.tasks[type].first,    \
 }
 
 #ifdef CONFIG_SECURITY_FILE_CAPABILITIES
@@ -179,9 +179,8 @@ extern struct group_info init_groups;
        .fs_excl        = ATOMIC_INIT(0),                               \
        .pi_lock        = __SPIN_LOCK_UNLOCKED(tsk.pi_lock),            \
        .pids = {                                                       \
-               [PIDTYPE_PID]  = INIT_PID_LINK(PIDTYPE_PID),            \
-               [PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID),           \
-               [PIDTYPE_SID]  = INIT_PID_LINK(PIDTYPE_SID),            \
+               [PIDTYPE_PGID] = INIT_PID_HLIST_NODE(PIDTYPE_PGID),     \
+               [PIDTYPE_SID]  = INIT_PID_HLIST_NODE(PIDTYPE_SID),      \
        },                                                              \
        .dirties = INIT_PROP_LOCAL_SINGLE(dirties),                     \
        INIT_TRACE_IRQFLAGS                                             \
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 061abb6..828355e 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -5,9 +5,10 @@
 
 enum pid_type
 {
-       PIDTYPE_PID,
        PIDTYPE_PGID,
        PIDTYPE_SID,
+       PIDTYPE_PID,
+#define PIDTYPE_ARRAY_MAX PIDTYPE_PID
        PIDTYPE_MAX
 };
 
@@ -58,7 +59,8 @@ struct pid
 {
        atomic_t count;
        /* lists of tasks that use this pid */
-       struct hlist_head tasks[PIDTYPE_MAX];
+       struct task_struct *tsk;
+       struct hlist_head tasks[PIDTYPE_ARRAY_MAX];
        struct rcu_head rcu;
        int level;
        struct upid numbers[1];
@@ -66,12 +68,6 @@ struct pid
 
 extern struct pid init_struct_pid;
 
-struct pid_link
-{
-       struct hlist_node node;
-       struct pid *pid;
-};
-
 static inline struct pid *get_pid(struct pid *pid)
 {
        if (pid)
@@ -158,7 +154,7 @@ static inline pid_t pid_vnr(struct pid *pid)
                struct hlist_node *pos___;                              \
                if (pid != NULL)                                        \
                        hlist_for_each_entry_rcu((task), pos___,        \
-                               &pid->tasks[type], pids[type].node) {
+                               &pid->tasks[type], pids[type]) {
 
 #define while_each_pid_task(pid, type, task)                           \
                        }                                               \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1b1e25b..496dfda 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -453,7 +453,6 @@ struct signal_struct {
 
        /* ITIMER_REAL timer for the process */
        struct hrtimer real_timer;
-       struct task_struct *tsk;
        ktime_t it_real_incr;
 
        /* ITIMER_PROF and ITIMER_VIRTUAL timers for the process */
@@ -461,6 +460,8 @@ struct signal_struct {
        cputime_t it_prof_incr, it_virt_incr;
 
        /* job control IDs */
+       struct pid *tgid;
+       struct pid *pids[PIDTYPE_ARRAY_MAX];
 
        /*
         * pgrp and session fields are deprecated.
@@ -1034,8 +1035,9 @@ struct task_struct {
        struct list_head sibling;       /* linkage in my parent's children list 
*/
        struct task_struct *group_leader;       /* threadgroup leader */
 
+       struct pid *tid;
        /* PID/PID hash table linkage. */
-       struct pid_link pids[PIDTYPE_MAX];
+       struct hlist_node pids[PIDTYPE_ARRAY_MAX];
        struct list_head thread_group;
 
        struct completion *vfork_done;          /* for vfork() */
@@ -1261,22 +1263,34 @@ static inline void set_task_pgrp(struct task_struct 
*tsk, pid_t pgrp)
 
 static inline struct pid *task_pid(struct task_struct *task)
 {
-       return task->pids[PIDTYPE_PID].pid;
+       return task->tid;
 }
 
 static inline struct pid *task_tgid(struct task_struct *task)
 {
-       return task->group_leader->pids[PIDTYPE_PID].pid;
+       struct signal_struct *sig = rcu_dereference(task->signal);
+       struct pid *pid = NULL;
+       if (sig)
+               pid = sig->tgid;
+       return pid;
 }
 
 static inline struct pid *task_pgrp(struct task_struct *task)
 {
-       return task->group_leader->pids[PIDTYPE_PGID].pid;
+       struct signal_struct *sig = rcu_dereference(task->signal);
+       struct pid *pid = NULL;
+       if (sig)
+               pid = sig->pids[PIDTYPE_PGID];
+       return pid;
 }
 
 static inline struct pid *task_session(struct task_struct *task)
 {
-       return task->group_leader->pids[PIDTYPE_SID].pid;
+       struct signal_struct *sig = rcu_dereference(task->signal);
+       struct pid *pid = NULL;
+       if (sig)
+               pid = sig->pids[PIDTYPE_SID];
+       return pid;
 }
 
 struct pid_namespace;
@@ -1371,7 +1385,7 @@ static inline pid_t task_ppid_nr_ns(struct task_struct 
*tsk,
  */
 static inline int pid_alive(struct task_struct *p)
 {
-       return p->pids[PIDTYPE_PID].pid != NULL;
+       return p->signal != NULL;
 }
 
 /**
@@ -1652,7 +1666,6 @@ extern void block_all_signals(int (*notifier)(void 
*priv), void *priv,
 extern void unblock_all_signals(void);
 extern void release_task(struct task_struct * p);
 extern int send_sig_info(int, struct siginfo *, struct task_struct *);
-extern int send_group_sig_info(int, struct siginfo *, struct task_struct *);
 extern int force_sigsegv(int, struct task_struct *);
 extern int force_sig_info(int, struct siginfo *, struct task_struct *);
 extern int __kill_pgrp_info(int sig, struct siginfo *info, struct pid *pgrp);
@@ -1772,17 +1785,6 @@ extern void wait_task_inactive(struct task_struct * p);
 /* de_thread depends on thread_group_leader not being a pid based check */
 #define thread_group_leader(p) (p == p->group_leader)
 
-/* Do to the insanities of de_thread it is possible for a process
- * to have the pid of the thread group leader without actually being
- * the thread group leader.  For iteration through the pids in proc
- * all we care about is that we have a task with the appropriate
- * pid, we don't actually care if we have the right task.
- */
-static inline int has_group_leader_pid(struct task_struct *p)
-{
-       return p->pid == p->tgid;
-}
-
 static inline
 int same_thread_group(struct task_struct *p1, struct task_struct *p2)
 {
@@ -1800,9 +1802,6 @@ static inline int thread_group_empty(struct task_struct 
*p)
        return list_empty(&p->thread_group);
 }
 
-#define delay_group_leader(p) \
-               (thread_group_leader(p) && !thread_group_empty(p))
-
 /*
  * Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
  * subscriptions and synchronises with wait4().  Also used in procfs.  Also
diff --git a/kernel/exit.c b/kernel/exit.c
index 1ab19f0..94552e0 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -57,7 +57,6 @@ static void exit_mm(struct task_struct * tsk);
 static void __unhash_process(struct task_struct *p)
 {
        nr_threads--;
-       detach_pid(p, PIDTYPE_PID);
        if (thread_group_leader(p)) {
                detach_pid(p, PIDTYPE_PGID);
                detach_pid(p, PIDTYPE_SID);
@@ -65,6 +64,7 @@ static void __unhash_process(struct task_struct *p)
                list_del_rcu(&p->tasks);
                __get_cpu_var(process_counts)--;
        }
+       detach_pid(p, PIDTYPE_PID);
        list_del_rcu(&p->thread_group);
        remove_parent(p);
 }
@@ -144,44 +144,15 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
 
 void release_task(struct task_struct * p)
 {
-       struct task_struct *leader;
-       int zap_leader;
-repeat:
        atomic_dec(&p->user->processes);
        proc_flush_task(p);
        write_lock_irq(&tasklist_lock);
        ptrace_unlink(p);
        BUG_ON(!list_empty(&p->ptrace_list) || 
!list_empty(&p->ptrace_children));
        __exit_signal(p);
-
-       /*
-        * If we are the last non-leader member of the thread
-        * group, and the leader is zombie, then notify the
-        * group leader's parent process. (if it wants notification.)
-        */
-       zap_leader = 0;
-       leader = p->group_leader;
-       if (leader != p && thread_group_empty(leader) && leader->exit_state == 
EXIT_ZOMBIE) {
-               BUG_ON(leader->exit_signal == -1);
-               do_notify_parent(leader, leader->exit_signal);
-               /*
-                * If we were the last child thread and the leader has
-                * exited already, and the leader's parent ignores SIGCHLD,
-                * then we are the one who should release the leader.
-                *
-                * do_notify_parent() will have marked it self-reaping in
-                * that case.
-                */
-               zap_leader = (leader->exit_signal == -1);
-       }
-
        write_unlock_irq(&tasklist_lock);
        release_thread(p);
        call_rcu(&p->rcu, delayed_put_task_struct);
-
-       p = leader;
-       if (unlikely(zap_leader))
-               goto repeat;
 }
 
 /*
@@ -633,8 +604,7 @@ reparent_thread(struct task_struct *p, struct task_struct 
*father, int traced)
        /* If we'd notified the old parent about this child's death,
         * also notify the new parent.
         */
-       if (!traced && p->exit_state == EXIT_ZOMBIE &&
-           p->exit_signal != -1 && thread_group_empty(p))
+       if (!traced && p->exit_state == EXIT_ZOMBIE && p->exit_signal != -1)
                do_notify_parent(p, p->exit_signal);
 
        /*
@@ -702,8 +672,7 @@ static void forget_original_parent(struct task_struct 
*father)
                } else {
                        /* reparent ptraced task to its real parent */
                        __ptrace_unlink (p);
-                       if (p->exit_state == EXIT_ZOMBIE && p->exit_signal != 
-1 &&
-                           thread_group_empty(p))
+                       if (p->exit_state == EXIT_ZOMBIE && p->exit_signal != 
-1)
                                do_notify_parent(p, p->exit_signal);
                }
 
@@ -773,6 +742,11 @@ static void exit_notify(struct task_struct *tsk)
        exit_task_namespaces(tsk);
 
        write_lock_irq(&tasklist_lock);
+       /* If we haven't yet made gettid() == getpid() do so now */
+       if (thread_group_leader(tsk) && (tsk->tid != tsk->signal->tgid)) {
+               detach_pid(tsk, PIDTYPE_PID);
+               attach_pid(tsk, PIDTYPE_PID, tsk->signal->tgid);
+       }
        /*
         * Check to see if any process groups have become orphaned
         * as a result of our exiting, and if they have any stopped
@@ -818,7 +792,7 @@ static void exit_notify(struct task_struct *tsk)
         * send it a SIGCHLD instead of honoring exit_signal.  exit_signal
         * only has special meaning to our real parent.
         */
-       if (tsk->exit_signal != -1 && thread_group_empty(tsk)) {
+       if (tsk->exit_signal != -1) {
                int signal = tsk->parent == tsk->real_parent ? tsk->exit_signal 
: SIGCHLD;
                do_notify_parent(tsk, signal);
        } else if (tsk->ptrace) {
@@ -946,6 +920,48 @@ fastcall NORET_TYPE void do_exit(long code)
        }
 
        tsk->flags |= PF_EXITING;
+       /* Transfer thread group leadership */
+       if (thread_group_leader(tsk) && !thread_group_empty(tsk)) {
+               struct task_struct *new_leader, *t;
+               write_lock_irq(&tasklist_lock);
+               for (t = next_thread(tsk); t != tsk; t = next_thread(t)) {
+                       if (!(t->flags & PF_EXITING))
+                               break;
+               }
+               if (t != tsk) {
+                       new_leader = t;
+               
+                       new_leader->start_time = tsk->start_time;
+                       task_pid(tsk)->tsk = new_leader;
+                       transfer_pid(tsk, new_leader, PIDTYPE_PGID);
+                       transfer_pid(tsk, new_leader, PIDTYPE_SID);
+                       list_replace_rcu(&tsk->tasks, &new_leader->tasks);
+
+                       /* Update group_leader on all of the threads... */
+                       new_leader->group_leader = new_leader;
+                       tsk->group_leader = new_leader;
+                       for (t = next_thread(tsk); t != tsk; t= next_thread(t)) 
{
+                               t->group_leader = new_leader;
+                       }
+
+                       new_leader->exit_signal = tsk->exit_signal;
+                       tsk->exit_signal = -1;
+
+                       write_unlock_irq(&tasklist_lock);
+               } else {
+                       write_unlock_irq(&tasklist_lock);
+                       /* Wait for the other threads to exit before continuing 
*/
+                       for (;;) {
+                               read_lock(&tasklist_lock);
+                               if (thread_group_empty(tsk))
+                                       break;
+                               __set_current_state(TASK_UNINTERRUPTIBLE);
+                               read_unlock(&tasklist_lock);
+                               schedule();
+                       }
+                       read_unlock(&tasklist_lock);
+               }
+       }
        /*
         * tsk->flags are checked in the futex code to protect against
         * an exiting task cleaning up the robust pi futexes.
@@ -1106,20 +1122,18 @@ asmlinkage void sys_exit_group(int error_code)
        do_group_exit((error_code & 0xff) << 8);
 }
 
-static int eligible_child(pid_t pid, int options, struct task_struct *p)
+static int eligible_child(enum pid_type type, struct pid *pid, int options, 
struct task_struct *p)
 {
        int err;
-       struct pid_namespace *ns;
 
-       ns = current->nsproxy->pid_ns;
-       if (pid > 0) {
-               if (task_pid_nr_ns(p, ns) != pid)
+       if (type == PIDTYPE_PID) {
+               /* Match all pids pointing at task p */
+               if (pid_task(pid, PIDTYPE_PID) != p)
                        return 0;
-       } else if (!pid) {
-               if (task_pgrp_nr_ns(p, ns) != task_pgrp_vnr(current))
-                       return 0;
-       } else if (pid != -1) {
-               if (task_pgrp_nr_ns(p, ns) != -pid)
+       } else if (type < PIDTYPE_MAX) {
+               struct signal_struct *sig;
+               sig = rcu_dereference(p->signal);
+               if (sig && (sig->pids[type] != pid))
                        return 0;
        }
 
@@ -1346,7 +1360,8 @@ static int wait_task_stopped(struct task_struct *p,
 {
        int retval, exit_code, why;
        uid_t uid = 0; /* unneeded, required by compiler */
-       pid_t pid;
+       struct pid *pid;
+       pid_t upid;
 
        exit_code = 0;
        spin_lock_irq(&p->sighand->siglock);
@@ -1382,12 +1397,16 @@ unlock_sig:
         * possibly take page faults for user memory.
         */
        get_task_struct(p);
-       pid = task_pid_nr_ns(p, current->nsproxy->pid_ns);
+       if (p->ptrace && same_thread_group(current, p->parent))
+               pid = task_pid(p);
+       else
+               pid = task_tgid(p);
+       upid = pid_nr_ns(pid, current->nsproxy->pid_ns);
        why = (p->ptrace & PT_PTRACED) ? CLD_TRAPPED : CLD_STOPPED;
        read_unlock(&tasklist_lock);
 
        if (unlikely(noreap))
-               return wait_noreap_copyout(p, pid, uid,
+               return wait_noreap_copyout(p, upid, uid,
                                           why, exit_code,
                                           infop, ru);
 
@@ -1403,11 +1422,11 @@ unlock_sig:
        if (!retval && infop)
                retval = put_user(exit_code, &infop->si_status);
        if (!retval && infop)
-               retval = put_user(pid, &infop->si_pid);
+               retval = put_user(upid, &infop->si_pid);
        if (!retval && infop)
                retval = put_user(uid, &infop->si_uid);
        if (!retval)
-               retval = pid;
+               retval = upid;
        put_task_struct(p);
 
        BUG_ON(!retval);
@@ -1425,7 +1444,8 @@ static int wait_task_continued(struct task_struct *p, int 
noreap,
                               int __user *stat_addr, struct rusage __user *ru)
 {
        int retval;
-       pid_t pid;
+       struct pid *pid;
+       pid_t upid;
        uid_t uid;
 
        if (!(p->signal->flags & SIGNAL_STOP_CONTINUED))
@@ -1440,8 +1460,11 @@ static int wait_task_continued(struct task_struct *p, 
int noreap,
        if (!noreap)
                p->signal->flags &= ~SIGNAL_STOP_CONTINUED;
        spin_unlock_irq(&p->sighand->siglock);
-
-       pid = task_pid_nr_ns(p, current->nsproxy->pid_ns);
+       if (p->ptrace && same_thread_group(current, p->parent))
+               pid = task_pid(p);
+       else
+               pid = task_tgid(p);
+       upid = pid_nr_ns(pid, current->nsproxy->pid_ns);
        uid = p->uid;
        get_task_struct(p);
        read_unlock(&tasklist_lock);
@@ -1452,9 +1475,9 @@ static int wait_task_continued(struct task_struct *p, int 
noreap,
                if (!retval && stat_addr)
                        retval = put_user(0xffff, stat_addr);
                if (!retval)
-                       retval = pid;
+                       retval = upid;
        } else {
-               retval = wait_noreap_copyout(p, pid, uid,
+               retval = wait_noreap_copyout(p, upid, uid,
                                             CLD_CONTINUED, SIGCONT,
                                             infop, ru);
                BUG_ON(retval == 0);
@@ -1463,13 +1486,25 @@ static int wait_task_continued(struct task_struct *p, 
int noreap,
        return retval;
 }
 
-static long do_wait(pid_t pid, int options, struct siginfo __user *infop,
+static long do_wait(pid_t upid, int options, struct siginfo __user *infop,
                    int __user *stat_addr, struct rusage __user *ru)
 {
        DECLARE_WAITQUEUE(wait, current);
        struct task_struct *tsk;
        int flag, retval;
-
+       struct pid *pid = NULL;
+       enum pid_type type = PIDTYPE_MAX;
+
+       if (upid > 0) {
+               type = PIDTYPE_PID;
+               pid = find_get_pid(upid);
+       } else if (upid == 0) {
+               type = PIDTYPE_PGID;
+               pid = get_pid(task_pgrp(current));
+       } else if (upid < -1) {
+               type = PIDTYPE_PGID;
+               pid = find_get_pid(-upid);
+       }
        add_wait_queue(&current->signal->wait_chldexit,&wait);
 repeat:
        /*
@@ -1484,7 +1519,7 @@ repeat:
                struct task_struct *p;
 
                list_for_each_entry(p, &tsk->children, sibling) {
-                       int ret = eligible_child(pid, options, p);
+                       int ret = eligible_child(type, pid, options, p);
                        if (!ret)
                                continue;
 
@@ -1503,8 +1538,7 @@ repeat:
                                retval = wait_task_stopped(p,
                                                (options & WNOWAIT), infop,
                                                stat_addr, ru);
-                       } else if (p->exit_state == EXIT_ZOMBIE &&
-                                       !delay_group_leader(p)) {
+                       } else if (p->exit_state == EXIT_ZOMBIE) {
                                /*
                                 * We don't reap group leaders with subthreads.
                                 */
@@ -1531,7 +1565,7 @@ repeat:
                if (!flag) {
                        list_for_each_entry(p, &tsk->ptrace_children,
                                                                ptrace_list) {
-                               flag = eligible_child(pid, options, p);
+                               flag = eligible_child(type, pid, options, p);
                                if (!flag)
                                        continue;
                                if (likely(flag > 0))
@@ -1560,6 +1594,7 @@ repeat:
 end:
        current->state = TASK_RUNNING;
        remove_wait_queue(&current->signal->wait_chldexit,&wait);
+       put_pid(pid);
        if (infop) {
                if (retval > 0)
                        retval = 0;
diff --git a/kernel/fork.c b/kernel/fork.c
index 7abb592..e986be9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -883,7 +883,6 @@ static int copy_signal(unsigned long clone_flags, struct 
task_struct *tsk)
        hrtimer_init(&sig->real_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
        sig->it_real_incr.tv64 = 0;
        sig->real_timer.function = it_real_fn;
-       sig->tsk = tsk;
 
        sig->it_virt_expires = cputime_zero;
        sig->it_virt_incr = cputime_zero;
@@ -1308,6 +1307,7 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
                        if (clone_flags & CLONE_NEWPID)
                                p->nsproxy->pid_ns->child_reaper = p;
 
+                       p->signal->tgid = pid;
                        p->signal->tty = current->signal->tty;
                        set_task_pgrp(p, task_pgrp_nr(current));
                        set_task_session(p, task_session_nr(current));
diff --git a/kernel/itimer.c b/kernel/itimer.c
index 2fab344..f40b589 100644
--- a/kernel/itimer.c
+++ b/kernel/itimer.c
@@ -132,7 +132,7 @@ enum hrtimer_restart it_real_fn(struct hrtimer *timer)
        struct signal_struct *sig =
                container_of(timer, struct signal_struct, real_timer);
 
-       send_group_sig_info(SIGALRM, SEND_SIG_PRIV, sig->tsk);
+       kill_pid_info(SIGALRM, SEND_SIG_PRIV, sig->tgid);
 
        return HRTIMER_NORESTART;
 }
diff --git a/kernel/pid.c b/kernel/pid.c
index 21f027c..b45b53d 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -319,28 +319,39 @@ EXPORT_SYMBOL_GPL(find_pid);
 int fastcall attach_pid(struct task_struct *task, enum pid_type type,
                struct pid *pid)
 {
-       struct pid_link *link;
-
-       link = &task->pids[type];
-       link->pid = pid;
-       hlist_add_head_rcu(&link->node, &pid->tasks[type]);
+       if (type == PIDTYPE_PID) {
+               task->tid = pid;
+               pid->tsk = task;
+       }
+       else {
+               task->signal->pids[type] = pid;
+               hlist_add_head_rcu(&task->pids[type], &pid->tasks[type]);
+       }
 
        return 0;
 }
 
 void fastcall detach_pid(struct task_struct *task, enum pid_type type)
 {
-       struct pid_link *link;
-       struct pid *pid;
+       struct pid **ppid, *pid;
        int tmp;
 
-       link = &task->pids[type];
-       pid = link->pid;
-
-       hlist_del_rcu(&link->node);
-       link->pid = NULL;
+       if (type == PIDTYPE_PID) {
+               ppid = &task->tid;
+               pid = *ppid;
+               if (pid->tsk == task)
+                       pid->tsk = NULL;
+       }
+       else {
+               hlist_del_rcu(&task->pids[type]);
+               ppid = &task->signal->pids[type];
+       }
+       pid = *ppid;
+       *ppid = NULL;
 
-       for (tmp = PIDTYPE_MAX; --tmp >= 0; )
+       if (pid->tsk)
+               return;
+       for (tmp = PIDTYPE_MAX -1; --tmp >= 0; )
                if (!hlist_empty(&pid->tasks[tmp]))
                        return;
 
@@ -351,19 +362,22 @@ void fastcall detach_pid(struct task_struct *task, enum 
pid_type type)
 void fastcall transfer_pid(struct task_struct *old, struct task_struct *new,
                           enum pid_type type)
 {
-       new->pids[type].pid = old->pids[type].pid;
-       hlist_replace_rcu(&old->pids[type].node, &new->pids[type].node);
-       old->pids[type].pid = NULL;
+       hlist_replace_rcu(&old->pids[type], &new->pids[type]);
 }
 
 struct task_struct * fastcall pid_task(struct pid *pid, enum pid_type type)
 {
        struct task_struct *result = NULL;
        if (pid) {
-               struct hlist_node *first;
-               first = rcu_dereference(pid->tasks[type].first);
-               if (first)
-                       result = hlist_entry(first, struct task_struct, 
pids[(type)].node);
+               if (type == PIDTYPE_PID)
+                       result = rcu_dereference(pid->tsk);
+               else {
+                       struct hlist_node *first;
+                       first = rcu_dereference(pid->tasks[type].first);
+                       if (first)
+                               result = hlist_entry(first, struct task_struct,
+                                                    pids[(type)]);
+               }
        }
        return result;
 }
@@ -402,7 +416,11 @@ struct pid *get_task_pid(struct task_struct *task, enum 
pid_type type)
 {
        struct pid *pid;
        rcu_read_lock();
-       pid = get_pid(task->pids[type].pid);
+       if (type == PIDTYPE_PID)
+               pid = task->tid;
+       else
+               pid = task->signal->pids[type];
+       get_pid(pid);
        rcu_read_unlock();
        return pid;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 06e663d..af8c49f 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1195,20 +1195,6 @@ send_sig(int sig, struct task_struct *p, int priv)
        return send_sig_info(sig, __si_special(priv), p);
 }
 
-/*
- * This is the entry point for "process-wide" signals.
- * They will go to an appropriate thread in the thread group.
- */
-int
-send_group_sig_info(int sig, struct siginfo *info, struct task_struct *p)
-{
-       int ret;
-       read_lock(&tasklist_lock);
-       ret = group_send_sig_info(sig, info, p);
-       read_unlock(&tasklist_lock);
-       return ret;
-}
-
 void
 force_sig(int sig, struct task_struct *p)
 {
@@ -1501,12 +1487,15 @@ static void do_notify_parent_cldstop(struct task_struct 
*tsk, int why)
        unsigned long flags;
        struct task_struct *parent;
        struct sighand_struct *sighand;
+       struct pid *pid;
 
-       if (tsk->ptrace & PT_PTRACED)
+       if (tsk->ptrace & PT_PTRACED) {
                parent = tsk->parent;
-       else {
+               pid = task_pid(tsk);
+       } else {
                tsk = tsk->group_leader;
                parent = tsk->real_parent;
+               pid = task_tgid(tsk);
        }
 
        info.si_signo = SIGCHLD;
@@ -1515,7 +1504,7 @@ static void do_notify_parent_cldstop(struct task_struct 
*tsk, int why)
         * see comment in do_notify_parent() abot the following 3 lines
         */
        rcu_read_lock();
-       info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
+       info.si_pid = pid_nr_ns(pid, parent->nsproxy->pid_ns);
        rcu_read_unlock();
 
        info.si_uid = tsk->uid;
-- 
1.5.3.rc6.17.g1911

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to