Re: [new PATCH] Re: 8139too: defunct threads

2001-04-16 Thread Andrew Morton

John Fremlin wrote:
> 
> 
> > So it seems that we must reparent the thread to init, and
> > make sure that it delivers SIGCHLD to init when it exits.
> 
> Sounds good. Why isn't SIGCHLD a stronger default anyway.

mm?   The caller gets to choose...

> [...]
> 
> > + /* Set the exit signal to SIGCHLD so we signal init on exit */
> > + if (this_task->exit_signal ! 0) {
> 
> Tyop.

aargh.  Thanks.  So that's what `cvs commit' does :)

The patch is OK with this converted into `!='.  But I'll
refresh and retest for tomorrow.  Still not very happy with
this approach though...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [new PATCH] Re: 8139too: defunct threads

2001-04-16 Thread John Fremlin

 Andrew Morton <[EMAIL PROTECTED]> writes:

[...]

> None of these will work.  The problems with globally setting
> exit_signal to SIGCHLD are that
> 
> a) If the parent does waitpid(pid, status, __WCLONE), the
>waitpid will fail.  request_module() does this.  I don't
>know _why_ it does this.  Maybe it's bogus.  There is no
>explanation.

waitpid doesn't work on cloned children unless you put in __WCLONE or
__WALL, so this was necessary to catch the child at all. If you set to
use SIGCHLD this will no longer be needed (if I understand correctly).

[...]

> So it seems that we must reparent the thread to init, and
> make sure that it delivers SIGCHLD to init when it exits.

Sounds good. Why isn't SIGCHLD a stronger default anyway.

[...]

> + /* Set the exit signal to SIGCHLD so we signal init on exit */
> + if (this_task->exit_signal ! 0) {

Tyop.

> + printk(KERN_ERR "task `%s' exit_signal %d in daemonize()\n",
> + this_task->comm, this_task->exit_signal);
> + }
> + this_task->exit_signal = SIGCHLD;
> +
> + write_unlock_irq(_lock);
>  }
>  
>  void __init init_idle(void)
> 

-- 

http://www.penguinpowered.com/~vii
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [new PATCH] Re: 8139too: defunct threads

2001-04-16 Thread Andrew Morton

Manfred Spraul wrote:
> 
> I found the problem:
> 
> * init uses waitpid(-1,,), thus the __WALL flag is not set
> * without __WALL, only processes with exit_signal == SIGCHLD are reaped
> * it's impossible for user space processes to die with another
> exit_signal, forget_original_parent changes the exit_signal back to
> SIGCHLD ("We dont want people slaying init"), and init itself doesn't
> use clone.
> * kernel threads can die with an arbitrary exit_signal.

yep

> Alan, which fix would you prefer:
> * init could use wait3 and set __WALL.
> * all kernel thread users could set SIGCHLD. Some already do that
> (__call_usermodehelper).
> * the kernel_thread implementations could force the exit signal to
> SIGCHLD.

* Add SIGCHLD to all the users of kernel_thread(), in the cases
  where the thread can ever exit.

* Add
if (current->exit_signal == 0)
current->exit_signal = SIGCHLD;
  to daemonize().

None of these will work.  The problems with globally setting exit_signal
to SIGCHLD are that

a) If the parent does waitpid(pid, status, __WCLONE), the
   waitpid will fail.  request_module() does this.  I don't
   know _why_ it does this.  Maybe it's bogus.  There is no
   explanation.

b) When the kernel thread exits, it will send a SIGCHLD to
   its parent.  But the parent is not necessarily init!  It
   could be a userspace process (in this case, ifconfig).
   The kernel has no business spraying signals out to userspace
   tasks just because they happened to open a network interface.

   And for this reason we can't just go in and change 8139too.c
   to use SIGCHLD.

So it seems that we must reparent the thread to init, and
make sure that it delivers SIGCHLD to init when it exits.

The below patch does this, within daemonize().  But this precise
area was the source of serial screwups when I was doing the
call_usermodehelper() stuff, and I bet this approach will
still have problems.

The exit_files() will take care of releasing things like current->tty
and pwd.  But we still do not know what scheduling priority and policy
we inherited from the userspace parent, nor do we know what signal mask
we have, nor do we know what uid we're running as.  Resource limits.
Capabilties.  CPU mask.  Etcetera, etcetera.  All this stuff comes back
to bite.  Been there, got the scars :)

This is why I believe that kernel daemons should be launched by
keventd.  They belong to the *kernel*, not to userspace parents.



--- linux-2.4.4-pre3/kernel/sched.c Sun Apr 15 15:34:25 2001
+++ linux-akpm/kernel/sched.c   Sun Apr 15 21:59:26 2001
@@ -1260,32 +1260,53 @@
 /*
  * Put all the gunge required to become a kernel thread without
  * attached user resources in one place where it belongs.
+ *
+ * Kernel 2.4.4-pre3, [EMAIL PROTECTED]: reparent the caller
+ * to init and set the exit signal to SIGCHLD so the thread
+ * will be properly reaped if it exits.
  */
 
 void daemonize(void)
 {
struct fs_struct *fs;
-
+   struct task_struct *this_task = current;
 
/*
 * If we were started as result of loading a module, close all of the
 * user space pages.  We don't need them, and if we didn't close them
 * they would be locked into memory.
 */
-   exit_mm(current);
+   exit_mm(this_task);
 
-   current->session = 1;
-   current->pgrp = 1;
+   this_task->session = 1;
+   this_task->pgrp = 1;
 
/* Become as one with the init task */
 
-   exit_fs(current);   /* current->fs->count--; */
+   exit_fs(this_task); /* this_task->fs->count--; */
fs = init_task.fs;
-   current->fs = fs;
+   this_task->fs = fs;
atomic_inc(>count);
-   exit_files(current);
-   current->files = init_task.files;
-   atomic_inc(>files->count);
+   exit_files(this_task);
+   this_task->files = init_task.files;
+   atomic_inc(_task->files->count);
+
+   write_lock_irq(_lock);
+
+   /* Reparent to init */
+   REMOVE_LINKS(this_task);
+   this_task->p_opptr = child_reaper;
+   this_task->p_pptr = child_reaper;
+   SET_LINKS(this_task);
+
+   /* Set the exit signal to SIGCHLD so we signal init on exit */
+   if (this_task->exit_signal ! 0) {
+   printk(KERN_ERR "task `%s' exit_signal %d in daemonize()\n",
+   this_task->comm, this_task->exit_signal);
+   }
+   this_task->exit_signal = SIGCHLD;
+
+   write_unlock_irq(_lock);
 }
 
 void __init init_idle(void)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [new PATCH] Re: 8139too: defunct threads

2001-04-16 Thread Andrew Morton

Manfred Spraul wrote:
 
 I found the problem:
 
 * init uses waitpid(-1,,), thus the __WALL flag is not set
 * without __WALL, only processes with exit_signal == SIGCHLD are reaped
 * it's impossible for user space processes to die with another
 exit_signal, forget_original_parent changes the exit_signal back to
 SIGCHLD ("We dont want people slaying init"), and init itself doesn't
 use clone.
 * kernel threads can die with an arbitrary exit_signal.

yep

 Alan, which fix would you prefer:
 * init could use wait3 and set __WALL.
 * all kernel thread users could set SIGCHLD. Some already do that
 (__call_usermodehelper).
 * the kernel_thread implementations could force the exit signal to
 SIGCHLD.

* Add SIGCHLD to all the users of kernel_thread(), in the cases
  where the thread can ever exit.

* Add
if (current-exit_signal == 0)
current-exit_signal = SIGCHLD;
  to daemonize().

None of these will work.  The problems with globally setting exit_signal
to SIGCHLD are that

a) If the parent does waitpid(pid, status, __WCLONE), the
   waitpid will fail.  request_module() does this.  I don't
   know _why_ it does this.  Maybe it's bogus.  There is no
   explanation.

b) When the kernel thread exits, it will send a SIGCHLD to
   its parent.  But the parent is not necessarily init!  It
   could be a userspace process (in this case, ifconfig).
   The kernel has no business spraying signals out to userspace
   tasks just because they happened to open a network interface.

   And for this reason we can't just go in and change 8139too.c
   to use SIGCHLD.

So it seems that we must reparent the thread to init, and
make sure that it delivers SIGCHLD to init when it exits.

The below patch does this, within daemonize().  But this precise
area was the source of serial screwups when I was doing the
call_usermodehelper() stuff, and I bet this approach will
still have problems.

The exit_files() will take care of releasing things like current-tty
and pwd.  But we still do not know what scheduling priority and policy
we inherited from the userspace parent, nor do we know what signal mask
we have, nor do we know what uid we're running as.  Resource limits.
Capabilties.  CPU mask.  Etcetera, etcetera.  All this stuff comes back
to bite.  Been there, got the scars :)

This is why I believe that kernel daemons should be launched by
keventd.  They belong to the *kernel*, not to userspace parents.



--- linux-2.4.4-pre3/kernel/sched.c Sun Apr 15 15:34:25 2001
+++ linux-akpm/kernel/sched.c   Sun Apr 15 21:59:26 2001
@@ -1260,32 +1260,53 @@
 /*
  * Put all the gunge required to become a kernel thread without
  * attached user resources in one place where it belongs.
+ *
+ * Kernel 2.4.4-pre3, [EMAIL PROTECTED]: reparent the caller
+ * to init and set the exit signal to SIGCHLD so the thread
+ * will be properly reaped if it exits.
  */
 
 void daemonize(void)
 {
struct fs_struct *fs;
-
+   struct task_struct *this_task = current;
 
/*
 * If we were started as result of loading a module, close all of the
 * user space pages.  We don't need them, and if we didn't close them
 * they would be locked into memory.
 */
-   exit_mm(current);
+   exit_mm(this_task);
 
-   current-session = 1;
-   current-pgrp = 1;
+   this_task-session = 1;
+   this_task-pgrp = 1;
 
/* Become as one with the init task */
 
-   exit_fs(current);   /* current-fs-count--; */
+   exit_fs(this_task); /* this_task-fs-count--; */
fs = init_task.fs;
-   current-fs = fs;
+   this_task-fs = fs;
atomic_inc(fs-count);
-   exit_files(current);
-   current-files = init_task.files;
-   atomic_inc(current-files-count);
+   exit_files(this_task);
+   this_task-files = init_task.files;
+   atomic_inc(this_task-files-count);
+
+   write_lock_irq(tasklist_lock);
+
+   /* Reparent to init */
+   REMOVE_LINKS(this_task);
+   this_task-p_opptr = child_reaper;
+   this_task-p_pptr = child_reaper;
+   SET_LINKS(this_task);
+
+   /* Set the exit signal to SIGCHLD so we signal init on exit */
+   if (this_task-exit_signal ! 0) {
+   printk(KERN_ERR "task `%s' exit_signal %d in daemonize()\n",
+   this_task-comm, this_task-exit_signal);
+   }
+   this_task-exit_signal = SIGCHLD;
+
+   write_unlock_irq(tasklist_lock);
 }
 
 void __init init_idle(void)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [new PATCH] Re: 8139too: defunct threads

2001-04-16 Thread John Fremlin

 Andrew Morton [EMAIL PROTECTED] writes:

[...]

 None of these will work.  The problems with globally setting
 exit_signal to SIGCHLD are that
 
 a) If the parent does waitpid(pid, status, __WCLONE), the
waitpid will fail.  request_module() does this.  I don't
know _why_ it does this.  Maybe it's bogus.  There is no
explanation.

waitpid doesn't work on cloned children unless you put in __WCLONE or
__WALL, so this was necessary to catch the child at all. If you set to
use SIGCHLD this will no longer be needed (if I understand correctly).

[...]

 So it seems that we must reparent the thread to init, and
 make sure that it delivers SIGCHLD to init when it exits.

Sounds good. Why isn't SIGCHLD a stronger default anyway.

[...]

 + /* Set the exit signal to SIGCHLD so we signal init on exit */
 + if (this_task-exit_signal ! 0) {

Tyop.

 + printk(KERN_ERR "task `%s' exit_signal %d in daemonize()\n",
 + this_task-comm, this_task-exit_signal);
 + }
 + this_task-exit_signal = SIGCHLD;
 +
 + write_unlock_irq(tasklist_lock);
  }
  
  void __init init_idle(void)
 

-- 

http://www.penguinpowered.com/~vii
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [new PATCH] Re: 8139too: defunct threads

2001-04-16 Thread Andrew Morton

John Fremlin wrote:
 
 
  So it seems that we must reparent the thread to init, and
  make sure that it delivers SIGCHLD to init when it exits.
 
 Sounds good. Why isn't SIGCHLD a stronger default anyway.

mm?   The caller gets to choose...

 [...]
 
  + /* Set the exit signal to SIGCHLD so we signal init on exit */
  + if (this_task-exit_signal ! 0) {
 
 Tyop.

aargh.  Thanks.  So that's what `cvs commit' does :)

The patch is OK with this converted into `!='.  But I'll
refresh and retest for tomorrow.  Still not very happy with
this approach though...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [new PATCH] Re: 8139too: defunct threads

2001-04-15 Thread Rod Stewart


On Sun, 15 Apr 2001, Manfred Spraul wrote:

> Alan, which fix would you prefer:
> * init could use wait3 and set __WALL.
> * all kernel thread users could set SIGCHLD. Some already do that
> (__call_usermodehelper).
> * the kernel_thread implementations could force the exit signal to
> SIGCHLD.
>
> I'd prefer the last version.
> The attached patch is tested with i386. The alpha, parisc and ppc
> assember changes are guessed.

This patch fixed my problem.

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[new PATCH] Re: 8139too: defunct threads

2001-04-15 Thread Manfred Spraul

I found the problem:

* init uses waitpid(-1,,), thus the __WALL flag is not set
* without __WALL, only processes with exit_signal == SIGCHLD are reaped
* it's impossible for user space processes to die with another
exit_signal, forget_original_parent changes the exit_signal back to
SIGCHLD ("We dont want people slaying init"), and init itself doesn't
use clone.
* kernel threads can die with an arbitrary exit_signal.

Alan, which fix would you prefer:
* init could use wait3 and set __WALL.
* all kernel thread users could set SIGCHLD. Some already do that
(__call_usermodehelper).
* the kernel_thread implementations could force the exit signal to
SIGCHLD.

I'd prefer the last version. 
The attached patch is tested with i386. The alpha, parisc and ppc
assember changes are guessed.

--
Manfred

diff -ur 2.4/arch/alpha/kernel/entry.S build-2.4/arch/alpha/kernel/entry.S
--- 2.4/arch/alpha/kernel/entry.S   Sun Sep  3 20:36:45 2000
+++ build-2.4/arch/alpha/kernel/entry.S Sun Apr 15 14:58:01 2001
@@ -242,12 +242,12 @@
subq$30,4*8,$30
stq $10,16($30)
stq $9,8($30)
-   lda $0,CLONE_VM
+   lda $0,CLONE_VM|SIGCHLD
stq $26,0($30)
.prologue 1
mov $16,$9  /* save fn */   
mov $17,$10 /* save arg */
-   or  $18,$0,$16  /* shuffle flags to front; add CLONE_VM.  */
+   or  $18,$0,$16  /* shuffle flags to front; add CLONE_VM|SIGCHLD. */
bsr $26,kernel_clone
bne $20,1f  /* $20 is non-zero in child */
ldq $26,0($30)
diff -ur 2.4/arch/arm/kernel/process.c build-2.4/arch/arm/kernel/process.c
--- 2.4/arch/arm/kernel/process.c   Thu Feb 22 22:28:51 2001
+++ build-2.4/arch/arm/kernel/process.c Sun Apr 15 14:51:08 2001
@@ -368,6 +368,8 @@
 {
pid_t __ret;
 
+   flags |= SIGCHLD;
+
__asm__ __volatile__(
"orrr0, %1, %2  @ kernel_thread sys_clone
mov r1, #0
diff -ur 2.4/arch/cris/kernel/process.c build-2.4/arch/cris/kernel/process.c
--- 2.4/arch/cris/kernel/process.c  Sat Apr  7 22:01:49 2001
+++ build-2.4/arch/cris/kernel/process.cSun Apr 15 14:51:16 2001
@@ -127,6 +127,8 @@
 int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
 {
register long __a __asm__ ("r10");
+
+   flags |= SIGCHLD;

__asm__ __volatile__
("movu.w %1,r9\n\t" /* r9 contains syscall number, to sys_clone */
diff -ur 2.4/arch/i386/kernel/process.c build-2.4/arch/i386/kernel/process.c
--- 2.4/arch/i386/kernel/process.c  Thu Feb 22 22:28:52 2001
+++ build-2.4/arch/i386/kernel/process.cSun Apr 15 14:40:43 2001
@@ -440,6 +440,8 @@
 {
long retval, d0;
 
+   flags |= SIGCHLD;
+
__asm__ __volatile__(
"movl %%esp,%%esi\n\t"
"int $0x80\n\t" /* Linux/i386 system call */
diff -ur 2.4/arch/ia64/kernel/process.c build-2.4/arch/ia64/kernel/process.c
--- 2.4/arch/ia64/kernel/process.c  Thu Jan  4 21:50:17 2001
+++ build-2.4/arch/ia64/kernel/process.cSun Apr 15 14:51:44 2001
@@ -500,7 +500,7 @@
struct task_struct *parent = current;
int result, tid;
 
-   tid = clone(flags | CLONE_VM, 0);
+   tid = clone(flags | CLONE_VM | SIGCHLD, 0);
if (parent != current) {
result = (*fn)(arg);
_exit(result);
diff -ur 2.4/arch/m68k/kernel/process.c build-2.4/arch/m68k/kernel/process.c
--- 2.4/arch/m68k/kernel/process.c  Thu Feb 22 22:28:54 2001
+++ build-2.4/arch/m68k/kernel/process.cSun Apr 15 14:51:58 2001
@@ -135,7 +135,7 @@
 
{
register long retval __asm__ ("d0");
-   register long clone_arg __asm__ ("d1") = flags | CLONE_VM;
+   register long clone_arg __asm__ ("d1") = flags | CLONE_VM | SIGCHLD;
 
__asm__ __volatile__
  ("clrl %%d2\n\t"
diff -ur 2.4/arch/mips/kernel/process.c build-2.4/arch/mips/kernel/process.c
--- 2.4/arch/mips/kernel/process.c  Sat Apr  7 22:01:56 2001
+++ build-2.4/arch/mips/kernel/process.cSun Apr 15 14:52:12 2001
@@ -161,6 +161,8 @@
 {
long retval;
 
+   flags |= SIGCHLD;
+
__asm__ __volatile__(
".set\tnoreorder\n\t"
"move\t$6,$sp\n\t"
diff -ur 2.4/arch/mips64/kernel/process.c build-2.4/arch/mips64/kernel/process.c
--- 2.4/arch/mips64/kernel/process.cThu Feb 22 22:28:55 2001
+++ build-2.4/arch/mips64/kernel/process.c  Sun Apr 15 14:52:21 2001
@@ -154,6 +154,8 @@
 {
int retval;
 
+   flags |= SIGCHLD;
+
__asm__ __volatile__(
"move\t$6, $sp\n\t"
"move\t$4, %5\n\t"
diff -ur 2.4/arch/parisc/kernel/entry.S build-2.4/arch/parisc/kernel/entry.S
--- 2.4/arch/parisc/kernel/entry.S  Sat Apr  7 22:01:58 2001
+++ build-2.4/arch/parisc/kernel/entry.SSun Apr 15 14:56:58 2001
@@ -497,7 +497,7 @@
 #endif
STREG   %r26, 

[new PATCH] Re: 8139too: defunct threads

2001-04-15 Thread Manfred Spraul

I found the problem:

* init uses waitpid(-1,,), thus the __WALL flag is not set
* without __WALL, only processes with exit_signal == SIGCHLD are reaped
* it's impossible for user space processes to die with another
exit_signal, forget_original_parent changes the exit_signal back to
SIGCHLD ("We dont want people slaying init"), and init itself doesn't
use clone.
* kernel threads can die with an arbitrary exit_signal.

Alan, which fix would you prefer:
* init could use wait3 and set __WALL.
* all kernel thread users could set SIGCHLD. Some already do that
(__call_usermodehelper).
* the kernel_thread implementations could force the exit signal to
SIGCHLD.

I'd prefer the last version. 
The attached patch is tested with i386. The alpha, parisc and ppc
assember changes are guessed.

--
Manfred

diff -ur 2.4/arch/alpha/kernel/entry.S build-2.4/arch/alpha/kernel/entry.S
--- 2.4/arch/alpha/kernel/entry.S   Sun Sep  3 20:36:45 2000
+++ build-2.4/arch/alpha/kernel/entry.S Sun Apr 15 14:58:01 2001
@@ -242,12 +242,12 @@
subq$30,4*8,$30
stq $10,16($30)
stq $9,8($30)
-   lda $0,CLONE_VM
+   lda $0,CLONE_VM|SIGCHLD
stq $26,0($30)
.prologue 1
mov $16,$9  /* save fn */   
mov $17,$10 /* save arg */
-   or  $18,$0,$16  /* shuffle flags to front; add CLONE_VM.  */
+   or  $18,$0,$16  /* shuffle flags to front; add CLONE_VM|SIGCHLD. */
bsr $26,kernel_clone
bne $20,1f  /* $20 is non-zero in child */
ldq $26,0($30)
diff -ur 2.4/arch/arm/kernel/process.c build-2.4/arch/arm/kernel/process.c
--- 2.4/arch/arm/kernel/process.c   Thu Feb 22 22:28:51 2001
+++ build-2.4/arch/arm/kernel/process.c Sun Apr 15 14:51:08 2001
@@ -368,6 +368,8 @@
 {
pid_t __ret;
 
+   flags |= SIGCHLD;
+
__asm__ __volatile__(
"orrr0, %1, %2  @ kernel_thread sys_clone
mov r1, #0
diff -ur 2.4/arch/cris/kernel/process.c build-2.4/arch/cris/kernel/process.c
--- 2.4/arch/cris/kernel/process.c  Sat Apr  7 22:01:49 2001
+++ build-2.4/arch/cris/kernel/process.cSun Apr 15 14:51:16 2001
@@ -127,6 +127,8 @@
 int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
 {
register long __a __asm__ ("r10");
+
+   flags |= SIGCHLD;

__asm__ __volatile__
("movu.w %1,r9\n\t" /* r9 contains syscall number, to sys_clone */
diff -ur 2.4/arch/i386/kernel/process.c build-2.4/arch/i386/kernel/process.c
--- 2.4/arch/i386/kernel/process.c  Thu Feb 22 22:28:52 2001
+++ build-2.4/arch/i386/kernel/process.cSun Apr 15 14:40:43 2001
@@ -440,6 +440,8 @@
 {
long retval, d0;
 
+   flags |= SIGCHLD;
+
__asm__ __volatile__(
"movl %%esp,%%esi\n\t"
"int $0x80\n\t" /* Linux/i386 system call */
diff -ur 2.4/arch/ia64/kernel/process.c build-2.4/arch/ia64/kernel/process.c
--- 2.4/arch/ia64/kernel/process.c  Thu Jan  4 21:50:17 2001
+++ build-2.4/arch/ia64/kernel/process.cSun Apr 15 14:51:44 2001
@@ -500,7 +500,7 @@
struct task_struct *parent = current;
int result, tid;
 
-   tid = clone(flags | CLONE_VM, 0);
+   tid = clone(flags | CLONE_VM | SIGCHLD, 0);
if (parent != current) {
result = (*fn)(arg);
_exit(result);
diff -ur 2.4/arch/m68k/kernel/process.c build-2.4/arch/m68k/kernel/process.c
--- 2.4/arch/m68k/kernel/process.c  Thu Feb 22 22:28:54 2001
+++ build-2.4/arch/m68k/kernel/process.cSun Apr 15 14:51:58 2001
@@ -135,7 +135,7 @@
 
{
register long retval __asm__ ("d0");
-   register long clone_arg __asm__ ("d1") = flags | CLONE_VM;
+   register long clone_arg __asm__ ("d1") = flags | CLONE_VM | SIGCHLD;
 
__asm__ __volatile__
  ("clrl %%d2\n\t"
diff -ur 2.4/arch/mips/kernel/process.c build-2.4/arch/mips/kernel/process.c
--- 2.4/arch/mips/kernel/process.c  Sat Apr  7 22:01:56 2001
+++ build-2.4/arch/mips/kernel/process.cSun Apr 15 14:52:12 2001
@@ -161,6 +161,8 @@
 {
long retval;
 
+   flags |= SIGCHLD;
+
__asm__ __volatile__(
".set\tnoreorder\n\t"
"move\t$6,$sp\n\t"
diff -ur 2.4/arch/mips64/kernel/process.c build-2.4/arch/mips64/kernel/process.c
--- 2.4/arch/mips64/kernel/process.cThu Feb 22 22:28:55 2001
+++ build-2.4/arch/mips64/kernel/process.c  Sun Apr 15 14:52:21 2001
@@ -154,6 +154,8 @@
 {
int retval;
 
+   flags |= SIGCHLD;
+
__asm__ __volatile__(
"move\t$6, $sp\n\t"
"move\t$4, %5\n\t"
diff -ur 2.4/arch/parisc/kernel/entry.S build-2.4/arch/parisc/kernel/entry.S
--- 2.4/arch/parisc/kernel/entry.S  Sat Apr  7 22:01:58 2001
+++ build-2.4/arch/parisc/kernel/entry.SSun Apr 15 14:56:58 2001
@@ -497,7 +497,7 @@
 #endif
STREG   %r26, 

Re: [new PATCH] Re: 8139too: defunct threads

2001-04-15 Thread Rod Stewart


On Sun, 15 Apr 2001, Manfred Spraul wrote:

 Alan, which fix would you prefer:
 * init could use wait3 and set __WALL.
 * all kernel thread users could set SIGCHLD. Some already do that
 (__call_usermodehelper).
 * the kernel_thread implementations could force the exit signal to
 SIGCHLD.

 I'd prefer the last version.
 The attached patch is tested with i386. The alpha, parisc and ppc
 assember changes are guessed.

This patch fixed my problem.

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Rod Stewart


On Sat, 14 Apr 2001, Manfred Spraul wrote:
> From: "Alan Cox" <[EMAIL PROTECTED]>
> >
> > That has an implicit race, a zombie can always appear as we are
> execing init.
> > I think init wants fixing
> >
> Rod, could you boot again with the unpatched kernel and send a sigchild
> to init?
>
> #kill -CHLD 1
>
> If I understand the init code correctly the sigchild handler reaps all
> zombies, but probably the signal got lost because the children died
> before the parent was created ;-)

That doesn't 'fix' it.  The thing I find funny is that it only appears
when IP_PNP is compiled in.  It is as if the driver ends up in some weird
state when IP_PNP is used.  According to ps, from my limited
understanding, the thread is stuck in do_exit

[root@stewart-nw34 /root]# ps elaxww|grep eth
  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY  TIME COMMAND
044 0 7 1   9   0 00 do_exi Z?  0:00 [eth0 ]
044 0 8 1   9   0 00 do_exi Z?  0:00 [eth1 ]
044 0 9 1   9   0 00 do_exi Z?  0:00 [eth2 ]
040 0   229 1   9   0 00 rtl813 SW   ?  0:00 [eth1]

Thanks for helping with this,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Andreas Ferber

Hi,

On Sat, Apr 14, 2001 at 07:53:28PM +0100, Alan Cox wrote:
> > Rod's init version (from RH 7.0) doesn't reap children that died before
> > it was started. Is that an init bug or should the kernel reap them
> > before the execve?
> I would say thats an init bug

It doesn't seem to be that simple.

Redhat's init does child reaping in its SIGCHLD handler using the
following:

while((pid = waitpid(-1, , WNOHANG)) != 0) {
if (errno == ECHILD) break;
/* do some stuff, nothing which could break out of the loop */
}

This should reap all leftover childs from kernel startup when init
receives SIGCHLD for the first time, but somehow the kernel seems to
skip them while searching for a dead process in sys_wait4().  I can't
do any further testing because I don't have a 8139 NIC, but I can't
find a problem in init's child reaping code.

Please tell me if I'm missing something, but I think this is really a
kernel issue, not a bug in init.

Andreas
-- 
I've finally learned what "upward compatible" means.  It means we get to
keep all our old mistakes.
-- Dennie van Tassel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Manfred Spraul

From: "Alan Cox" <[EMAIL PROTECTED]>
>
> That has an implicit race, a zombie can always appear as we are
execing init.
> I think init wants fixing
>
Rod, could you boot again with the unpatched kernel and send a sigchild
to init?

#kill -CHLD 1

If I understand the init code correctly the sigchild handler reaps all
zombies, but probably the signal got lost because the children died
before the parent was created ;-)

--
Manfred

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Alan Cox

> Rod's init version (from RH 7.0) doesn't reap children that died before
> it was started. Is that an init bug or should the kernel reap them
> before the execve?

I would say thats an init bug

> The attached patch reaps all zombies before the execve("/sbin/init").

That has an implicit race, a zombie can always appear as we are execing init.
I think init wants fixing

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Manfred Spraul

Hi Alan,

Rod's init version (from RH 7.0) doesn't reap children that died before
it was started. Is that an init bug or should the kernel reap them
before the execve?
The attached patch reaps all zombies before the execve("/sbin/init").

I also found a bug in kernel/context.c: it doesn't acquire the sigmask
spinlock around the call to recalc_sigpending.

Rod Stewart wrote:
> 
> Yes, that fixes my problem.  No more defunct eth? processes when IP_PNP is
> compiled in.  With the fix you said to the patch; replacing curtask with
> current.
>
Fortunately you don't use SMP - spin_lock_irq();...;spin_lock_irq()
instead of spin_lock_irq();...;spin_unlock_irq();

--
Manfred

// $Header$
// Kernel Version:
//  VERSION = 2
//  PATCHLEVEL = 4
//  SUBLEVEL = 3
//  EXTRAVERSION = -ac3
--- 2.4/init/main.c Sat Apr  7 22:02:27 2001
+++ build-2.4/init/main.c   Sat Apr 14 19:18:34 2001
@@ -883,6 +883,13 @@
 
(void) dup(0);
(void) dup(0);
+
+   while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG) > 0)
+   ;
+   spin_lock_irq(>sigmask_lock);
+   flush_signals(current);
+   recalc_sigpending(current);
+   spin_unlock_irq(>sigmask_lock);

/*
 * We try each of these until one succeeds.
--- 2.4/kernel/context.cFri Feb  2 15:20:37 2001
+++ build-2.4/kernel/context.c  Sat Apr 14 19:09:10 2001
@@ -101,8 +101,10 @@
if (signal_pending(curtask)) {
while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG) > 0)
;
+   spin_lock_irq(>sigmask_lock);
flush_signals(curtask);
recalc_sigpending(curtask);
+   spin_unlock_irq(>sigmask_lock);
}
}
 }





Re: 8139too: defunct threads

2001-04-14 Thread Rod Stewart


On Sat, 14 Apr 2001, Manfred Spraul wrote:

> >> Ah. Of course. All (or most) kernel initialisation is
> >> done by PID 1. Search for "kernel_thread" in init/main.c
> >>
> >> So it seems that in your setup, process 1 is not reaping
> >> children, which is why this hasn't been reported before.
> >> Is there something unusual about your setup?
>
> > I found the difference which causes this. If I build my kernel with
> > IP_PNP (IP: kernel level autoconfiguration) support I get a defunt
> > thread for each 8139too device. If I don't build with IP_PNP
> > support I don't get any, defunct ethernet threads.
>
> Does init(8) reap children that died before it was spawned? I assume
> that the defunct tasks were there _before_ init was spawned.
>
> Perhaps init() [in linux/init/main.c] should reap all defunct tasks
> before the execve("/sbin/init").
>
> I've attached an untested patch, could you try it?

Yes, that fixes my problem.  No more defunct eth? processes when IP_PNP is
compiled in.  With the fix you said to the patch; replacing curtask with
current.

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-14 Thread Manfred Spraul

>> Ah. Of course. All (or most) kernel initialisation is
>> done by PID 1. Search for "kernel_thread" in init/main.c
>>
>> So it seems that in your setup, process 1 is not reaping
>> children, which is why this hasn't been reported before.
>> Is there something unusual about your setup?

> I found the difference which causes this. If I build my kernel with
> IP_PNP (IP: kernel level autoconfiguration) support I get a defunt
> thread for each 8139too device. If I don't build with IP_PNP
> support I don't get any, defunct ethernet threads.

Does init(8) reap children that died before it was spawned? I assume
that the defunct tasks were there _before_ init was spawned.

Perhaps init() [in linux/init/main.c] should reap all defunct tasks
before the execve("/sbin/init").

I've attached an untested patch, could you try it?

--
Manfred


 patch-main.dat


Re: 8139too: defunct threads

2001-04-14 Thread Manfred Spraul

 Ah. Of course. All (or most) kernel initialisation is
 done by PID 1. Search for "kernel_thread" in init/main.c

 So it seems that in your setup, process 1 is not reaping
 children, which is why this hasn't been reported before.
 Is there something unusual about your setup?

 I found the difference which causes this. If I build my kernel with
 IP_PNP (IP: kernel level autoconfiguration) support I get a defunt
 thread for each 8139too device. If I don't build with IP_PNP
 support I don't get any, defunct ethernet threads.

Does init(8) reap children that died before it was spawned? I assume
that the defunct tasks were there _before_ init was spawned.

Perhaps init() [in linux/init/main.c] should reap all defunct tasks
before the execve("/sbin/init").

I've attached an untested patch, could you try it?

--
Manfred


 patch-main.dat


Re: 8139too: defunct threads

2001-04-14 Thread Rod Stewart


On Sat, 14 Apr 2001, Manfred Spraul wrote:

  Ah. Of course. All (or most) kernel initialisation is
  done by PID 1. Search for "kernel_thread" in init/main.c
 
  So it seems that in your setup, process 1 is not reaping
  children, which is why this hasn't been reported before.
  Is there something unusual about your setup?

  I found the difference which causes this. If I build my kernel with
  IP_PNP (IP: kernel level autoconfiguration) support I get a defunt
  thread for each 8139too device. If I don't build with IP_PNP
  support I don't get any, defunct ethernet threads.

 Does init(8) reap children that died before it was spawned? I assume
 that the defunct tasks were there _before_ init was spawned.

 Perhaps init() [in linux/init/main.c] should reap all defunct tasks
 before the execve("/sbin/init").

 I've attached an untested patch, could you try it?

Yes, that fixes my problem.  No more defunct eth? processes when IP_PNP is
compiled in.  With the fix you said to the patch; replacing curtask with
current.

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Manfred Spraul

Hi Alan,

Rod's init version (from RH 7.0) doesn't reap children that died before
it was started. Is that an init bug or should the kernel reap them
before the execve?
The attached patch reaps all zombies before the execve("/sbin/init").

I also found a bug in kernel/context.c: it doesn't acquire the sigmask
spinlock around the call to recalc_sigpending.

Rod Stewart wrote:
 
 Yes, that fixes my problem.  No more defunct eth? processes when IP_PNP is
 compiled in.  With the fix you said to the patch; replacing curtask with
 current.

Fortunately you don't use SMP - spin_lock_irq();...;spin_lock_irq()
instead of spin_lock_irq();...;spin_unlock_irq();

--
Manfred

// $Header$
// Kernel Version:
//  VERSION = 2
//  PATCHLEVEL = 4
//  SUBLEVEL = 3
//  EXTRAVERSION = -ac3
--- 2.4/init/main.c Sat Apr  7 22:02:27 2001
+++ build-2.4/init/main.c   Sat Apr 14 19:18:34 2001
@@ -883,6 +883,13 @@
 
(void) dup(0);
(void) dup(0);
+
+   while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG)  0)
+   ;
+   spin_lock_irq(current-sigmask_lock);
+   flush_signals(current);
+   recalc_sigpending(current);
+   spin_unlock_irq(current-sigmask_lock);

/*
 * We try each of these until one succeeds.
--- 2.4/kernel/context.cFri Feb  2 15:20:37 2001
+++ build-2.4/kernel/context.c  Sat Apr 14 19:09:10 2001
@@ -101,8 +101,10 @@
if (signal_pending(curtask)) {
while (waitpid(-1, (unsigned int *)0, __WALL|WNOHANG)  0)
;
+   spin_lock_irq(curtask-sigmask_lock);
flush_signals(curtask);
recalc_sigpending(curtask);
+   spin_unlock_irq(curtask-sigmask_lock);
}
}
 }





Re: [PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Alan Cox

 Rod's init version (from RH 7.0) doesn't reap children that died before
 it was started. Is that an init bug or should the kernel reap them
 before the execve?

I would say thats an init bug

 The attached patch reaps all zombies before the execve("/sbin/init").

That has an implicit race, a zombie can always appear as we are execing init.
I think init wants fixing

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Manfred Spraul

From: "Alan Cox" [EMAIL PROTECTED]

 That has an implicit race, a zombie can always appear as we are
execing init.
 I think init wants fixing

Rod, could you boot again with the unpatched kernel and send a sigchild
to init?

#kill -CHLD 1

If I understand the init code correctly the sigchild handler reaps all
zombies, but probably the signal got lost because the children died
before the parent was created ;-)

--
Manfred

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Andreas Ferber

Hi,

On Sat, Apr 14, 2001 at 07:53:28PM +0100, Alan Cox wrote:
  Rod's init version (from RH 7.0) doesn't reap children that died before
  it was started. Is that an init bug or should the kernel reap them
  before the execve?
 I would say thats an init bug

It doesn't seem to be that simple.

Redhat's init does child reaping in its SIGCHLD handler using the
following:

while((pid = waitpid(-1, st, WNOHANG)) != 0) {
if (errno == ECHILD) break;
/* do some stuff, nothing which could break out of the loop */
}

This should reap all leftover childs from kernel startup when init
receives SIGCHLD for the first time, but somehow the kernel seems to
skip them while searching for a dead process in sys_wait4().  I can't
do any further testing because I don't have a 8139 NIC, but I can't
find a problem in init's child reaping code.

Please tell me if I'm missing something, but I think this is really a
kernel issue, not a bug in init.

Andreas
-- 
I've finally learned what "upward compatible" means.  It means we get to
keep all our old mistakes.
-- Dennie van Tassel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: 8139too: defunct threads

2001-04-14 Thread Rod Stewart


On Sat, 14 Apr 2001, Manfred Spraul wrote:
 From: "Alan Cox" [EMAIL PROTECTED]
 
  That has an implicit race, a zombie can always appear as we are
 execing init.
  I think init wants fixing
 
 Rod, could you boot again with the unpatched kernel and send a sigchild
 to init?

 #kill -CHLD 1

 If I understand the init code correctly the sigchild handler reaps all
 zombies, but probably the signal got lost because the children died
 before the parent was created ;-)

That doesn't 'fix' it.  The thing I find funny is that it only appears
when IP_PNP is compiled in.  It is as if the driver ends up in some weird
state when IP_PNP is used.  According to ps, from my limited
understanding, the thread is stuck in do_exit

[root@stewart-nw34 /root]# ps elaxww|grep eth
  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY  TIME COMMAND
044 0 7 1   9   0 00 do_exi Z?  0:00 [eth0 defunct]
044 0 8 1   9   0 00 do_exi Z?  0:00 [eth1 defunct]
044 0 9 1   9   0 00 do_exi Z?  0:00 [eth2 defunct]
040 0   229 1   9   0 00 rtl813 SW   ?  0:00 [eth1]

Thanks for helping with this,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-13 Thread Rod Stewart


On Thu, 12 Apr 2001, Andrew Morton wrote:
> Rod Stewart wrote:
> >
> > On Thu, 12 Apr 2001, Andrew Morton wrote:
> > > Rod Stewart wrote:
> > > >
> > > > Hello,
> > > >
> > > > Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> > > > thread for each device we have; if the driver is built into the kernel.
> > > > If the driver is built as a module, no defunct threads appear.
> > >
> > > What is the parent PID for the defunct tasks?  zero?
> >
> > According to ps, 1
>
> Ah.  Of course.  All (or most) kernel initialisation is
> done by PID 1.  Search for "kernel_thread" in init/main.c
>
> So it seems that in your setup, process 1 is not reaping
> children, which is why this hasn't been reported before.
> Is there something unusual about your setup?

I found the difference which causes this.  If I build my kernel with
IP_PNP (IP: kernel level autoconfiguration) support I get a defunt thread
for each 8139too device.  If I don't build with IP_PNP support I don't get
any, defunct ethernet threads.

This make any sense?  Here is the relevant diff from a working config to a
bad one:
[root@stewart-nw34 conf]# diff -u config-p5-good config-p6-bad
--- config-p5-good  Fri Apr 13 16:07:10 2001
+++ config-p6-bad   Fri Apr 13 16:12:21 2001
@@ -173,7 +173,9 @@
# CONFIG_IP_ROUTE_TOS is not set
# CONFIG_IP_ROUTE_VERBOSE is not set
# CONFIG_IP_ROUTE_LARGE_TABLES is not set
-# CONFIG_IP_PNP is not set
+CONFIG_IP_PNP=y
+# CONFIG_IP_PNP_BOOTP is not set
+# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE=m
# CONFIG_NET_IPGRE_BROADCAST is not set

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-13 Thread Rod Stewart


On Thu, 12 Apr 2001, Andrew Morton wrote:
 Rod Stewart wrote:
 
  On Thu, 12 Apr 2001, Andrew Morton wrote:
   Rod Stewart wrote:
   
Hello,
   
Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
thread for each device we have; if the driver is built into the kernel.
If the driver is built as a module, no defunct threads appear.
  
   What is the parent PID for the defunct tasks?  zero?
 
  According to ps, 1

 Ah.  Of course.  All (or most) kernel initialisation is
 done by PID 1.  Search for "kernel_thread" in init/main.c

 So it seems that in your setup, process 1 is not reaping
 children, which is why this hasn't been reported before.
 Is there something unusual about your setup?

I found the difference which causes this.  If I build my kernel with
IP_PNP (IP: kernel level autoconfiguration) support I get a defunt thread
for each 8139too device.  If I don't build with IP_PNP support I don't get
any, defunct ethernet threads.

This make any sense?  Here is the relevant diff from a working config to a
bad one:
[root@stewart-nw34 conf]# diff -u config-p5-good config-p6-bad
--- config-p5-good  Fri Apr 13 16:07:10 2001
+++ config-p6-bad   Fri Apr 13 16:12:21 2001
@@ -173,7 +173,9 @@
# CONFIG_IP_ROUTE_TOS is not set
# CONFIG_IP_ROUTE_VERBOSE is not set
# CONFIG_IP_ROUTE_LARGE_TABLES is not set
-# CONFIG_IP_PNP is not set
+CONFIG_IP_PNP=y
+# CONFIG_IP_PNP_BOOTP is not set
+# CONFIG_IP_PNP_RARP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE=m
# CONFIG_NET_IPGRE_BROADCAST is not set

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread David Woodhouse



[EMAIL PROTECTED] said:
>  ho-hum.  Jeff, I think the best fix here is to bite the bullet and
> write kernel_daemon(), which will delegate thread creation to keventd,
> which is the only thing we have which reaps zombies.  Any better
> ideas?

Yes. Let init do it, as God intended. Why reap threads in the kernel when 
they could just reparent themselves as children of pid 1?

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Rod Stewart


On Thu, 12 Apr 2001, Andrew Morton wrote:
> Rod Stewart wrote:
> >
> > On Thu, 12 Apr 2001, Andrew Morton wrote:
> > > Is there something unusual about your setup?
> >
> > One box is standard PIII with RH 7.0, the other is a custom Crusoe TM5400
> > board.  But from further investigation it appears to be a kernel config
> > option.  As I've got a 2.4.0 kernel which has very little compiled in and
> > not showing the problem and another kernel which has many more networking
> > options built in and showing the problem.  I've seen this problem
> > since 2.4.0.test11.
> >
>
> Sorry.  I meant: what is process 1 on this machine?  Is it not
> the normal init?  If not, then according to Alan, the fault
> lies with your userspace.  Kernel requires that PID 1 reap
> children.

Yes, it is the normal init on both boxes.

-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Andrew Morton

Rod Stewart wrote:
> 
> On Thu, 12 Apr 2001, Andrew Morton wrote:
> > Is there something unusual about your setup?
> 
> One box is standard PIII with RH 7.0, the other is a custom Crusoe TM5400
> board.  But from further investigation it appears to be a kernel config
> option.  As I've got a 2.4.0 kernel which has very little compiled in and
> not showing the problem and another kernel which has many more networking
> options built in and showing the problem.  I've seen this problem
> since 2.4.0.test11.
> 

Sorry.  I meant: what is process 1 on this machine?  Is it not
the normal init?  If not, then according to Alan, the fault
lies with your userspace.  Kernel requires that PID 1 reap
children.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Rod Stewart

On Thu, 12 Apr 2001, Andrew Morton wrote:
> Rod Stewart wrote:
> >
> > On Thu, 12 Apr 2001, Andrew Morton wrote:
> > > Rod Stewart wrote:
> > > >
> > > > Hello,
> > > >
> > > > Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> > > > thread for each device we have; if the driver is built into the kernel.
> > > > If the driver is built as a module, no defunct threads appear.
> > >
> > > What is the parent PID for the defunct tasks?  zero?
> >
> > According to ps, 1
>
> Ah.  Of course.  All (or most) kernel initialisation is
> done by PID 1.  Search for "kernel_thread" in init/main.c
>
> So it seems that in your setup, process 1 is not reaping
> children, which is why this hasn't been reported before.
> Is there something unusual about your setup?

One box is standard PIII with RH 7.0, the other is a custom Crusoe TM5400
board.  But from further investigation it appears to be a kernel config
option.  As I've got a 2.4.0 kernel which has very little compiled in and
not showing the problem and another kernel which has many more networking
options built in and showing the problem.  I've seen this problem
since 2.4.0.test11.

I'll send a note once I find the config option which is causing this,
probably tomorrow.

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Alan Cox

> Plus it would mean that the kernel requires, for its
> correct operation, that process "1" is a child reaper.
> Is this a good thing?

That is already required. The rest of the reparenting functionality is also
in kernel/exit.c already

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Andrew Morton

Rod Stewart wrote:
> 
> On Thu, 12 Apr 2001, Andrew Morton wrote:
> > Rod Stewart wrote:
> > >
> > > Hello,
> > >
> > > Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> > > thread for each device we have; if the driver is built into the kernel.
> > > If the driver is built as a module, no defunct threads appear.
> >
> > What is the parent PID for the defunct tasks?  zero?
> 
> According to ps, 1

Ah.  Of course.  All (or most) kernel initialisation is
done by PID 1.  Search for "kernel_thread" in init/main.c

So it seems that in your setup, process 1 is not reaping
children, which is why this hasn't been reported before.
Is there something unusual about your setup?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Andrew Morton

Alan Cox wrote:
> 
> >  swapper doesn't know how to reap children, and
> > AFAIK there's no way for a kernel thread to fully clean itself
> > up.  This is always done by the parent.
> 
> Make daemonize() move threads with parent 0 to parent 1

Reparenting would require diving inside this lot:

/* 
 * pointers to (original) parent process, youngest child, younger sibling,
 * older sibling, respectively.  (p->father can be replaced with 
 * p->p_pptr->pid)
 */
struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;
struct list_head thread_group;

plus maybe rewriting pgrps, sessions, gids, etc.  Challenging.

Plus it would mean that the kernel requires, for its
correct operation, that process "1" is a child reaper.
Is this a good thing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Rod Stewart


On Thu, 12 Apr 2001, Andrew Morton wrote:
> Rod Stewart wrote:
> >
> > Hello,
> >
> > Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> > thread for each device we have; if the driver is built into the kernel.
> > If the driver is built as a module, no defunct threads appear.
>
> What is the parent PID for the defunct tasks?  zero?

According to ps, 1

[root@stewart-nw34 networking]# ps alexw
  F   UID PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY TIME  COMMAND
044 0  14 1   9   0 00 do_exi Z?  0:00 [eth0 ]
044 0  15 1   9   0 00 do_exi Z?  0:00 [eth1 ]
044 0  16 1   9   0 00 do_exi Z?  0:00 [eth2 ]
040 0 240 1   9   0 00 rtl813 SW   ?  0:00 [eth0]

-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Alan Cox

>  swapper doesn't know how to reap children, and
> AFAIK there's no way for a kernel thread to fully clean itself
> up.  This is always done by the parent.

Make daemonize() move threads with parent 0 to parent 1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Andrew Morton

Rod Stewart wrote:
> 
> Hello,
> 
> Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
> thread for each device we have; if the driver is built into the kernel.
> If the driver is built as a module, no defunct threads appear.

What is the parent PID for the defunct tasks?  zero?

 swapper doesn't know how to reap children, and
AFAIK there's no way for a kernel thread to fully clean itself
up.  This is always done by the parent.

ho-hum.  Jeff, I think the best fix here is to bite the
bullet and write kernel_daemon(), which will delegate
thread creation to keventd, which is the only thing
we have which reaps zombies.  Any better ideas?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



8139too: defunct threads

2001-04-12 Thread Rod Stewart


Hello,

Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
thread for each device we have; if the driver is built into the kernel.
If the driver is built as a module, no defunct threads appear.

This has happened with any 2.4 kernel we've used, up to and including
2.4.3.

Below is the output from a custom board (but the problem also shows up
with a standard PCI card with RTL-8139B) with three RealTek RTL8139
chipsets on it.

[root@stewart-nw34 /root]# ps uax|grep eth
root14  0.0  0.0 00 ?  Z13:39   0:00 [eth0  ]
root15  0.0  0.0 00 ?  Z13:39   0:00 [eth1  ]
root16  0.0  0.0 00 ?  Z13:39   0:00 [eth2  ]
root   240  0.0  0.0 00 ?SW   13:39   0:00 [eth0]
root   572  0.0  0.0 00 pts/1SW   13:49   0:00 [eth1]
root   538  0.0  0.4  1216  460 pts/0S13:41   0:00 grep eth

8139too Fast Ethernet driver 0.9.15c loaded
PCI: Enabling device 00:05.0 ( -> 0003)
PCI: Assigned IRQ 6 for device 00:05.0
PCI: Setting latency timer of device 00:05.0 to 64
eth0: RealTek RTL8139 Fast Ethernet at 0xc780, 00:10:57:01:00:19,
IRQ 6
eth0:  Identified 8139 chip type 'RTL-8139C'
PCI: Enabling device 00:09.0 ( -> 0003)
PCI: Assigned IRQ 6 for device 00:09.0
PCI: Setting latency timer of device 00:09.0 to 64
eth1: RealTek RTL8139 Fast Ethernet at 0xc7802100, 00:10:57:02:00:19,
IRQ 6
eth1:  Identified 8139 chip type 'RTL-8139C'
PCI: Enabling device 00:0a.0 ( -> 0003)
PCI: Assigned IRQ 6 for device 00:0a.0
PCI: Setting latency timer of device 00:0a.0 to 64
eth2: RealTek RTL8139 Fast Ethernet at 0xc7804200, 00:10:57:03:00:19,
IRQ 6
eth2:  Identified 8139 chip type 'RTL-8139C'


I'm not certain if this is supposed to be expected behaviour or not, if it
is we'll tell QA to ignore it.

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



8139too: defunct threads

2001-04-12 Thread Rod Stewart


Hello,

Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
thread for each device we have; if the driver is built into the kernel.
If the driver is built as a module, no defunct threads appear.

This has happened with any 2.4 kernel we've used, up to and including
2.4.3.

Below is the output from a custom board (but the problem also shows up
with a standard PCI card with RTL-8139B) with three RealTek RTL8139
chipsets on it.

[root@stewart-nw34 /root]# ps uax|grep eth
root14  0.0  0.0 00 ?  Z13:39   0:00 [eth0  defunct]
root15  0.0  0.0 00 ?  Z13:39   0:00 [eth1  defunct]
root16  0.0  0.0 00 ?  Z13:39   0:00 [eth2  defunct]
root   240  0.0  0.0 00 ?SW   13:39   0:00 [eth0]
root   572  0.0  0.0 00 pts/1SW   13:49   0:00 [eth1]
root   538  0.0  0.4  1216  460 pts/0S13:41   0:00 grep eth

8139too Fast Ethernet driver 0.9.15c loaded
PCI: Enabling device 00:05.0 ( - 0003)
PCI: Assigned IRQ 6 for device 00:05.0
PCI: Setting latency timer of device 00:05.0 to 64
eth0: RealTek RTL8139 Fast Ethernet at 0xc780, 00:10:57:01:00:19,
IRQ 6
eth0:  Identified 8139 chip type 'RTL-8139C'
PCI: Enabling device 00:09.0 ( - 0003)
PCI: Assigned IRQ 6 for device 00:09.0
PCI: Setting latency timer of device 00:09.0 to 64
eth1: RealTek RTL8139 Fast Ethernet at 0xc7802100, 00:10:57:02:00:19,
IRQ 6
eth1:  Identified 8139 chip type 'RTL-8139C'
PCI: Enabling device 00:0a.0 ( - 0003)
PCI: Assigned IRQ 6 for device 00:0a.0
PCI: Setting latency timer of device 00:0a.0 to 64
eth2: RealTek RTL8139 Fast Ethernet at 0xc7804200, 00:10:57:03:00:19,
IRQ 6
eth2:  Identified 8139 chip type 'RTL-8139C'


I'm not certain if this is supposed to be expected behaviour or not, if it
is we'll tell QA to ignore it.

Thanks,
-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Andrew Morton

Rod Stewart wrote:
 
 Hello,
 
 Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
 thread for each device we have; if the driver is built into the kernel.
 If the driver is built as a module, no defunct threads appear.

What is the parent PID for the defunct tasks?  zero?

slaps head swapper doesn't know how to reap children, and
AFAIK there's no way for a kernel thread to fully clean itself
up.  This is always done by the parent.

ho-hum.  Jeff, I think the best fix here is to bite the
bullet and write kernel_daemon(), which will delegate
thread creation to keventd, which is the only thing
we have which reaps zombies.  Any better ideas?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Alan Cox

 slaps head swapper doesn't know how to reap children, and
 AFAIK there's no way for a kernel thread to fully clean itself
 up.  This is always done by the parent.

Make daemonize() move threads with parent 0 to parent 1

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Rod Stewart


On Thu, 12 Apr 2001, Andrew Morton wrote:
 Rod Stewart wrote:
 
  Hello,
 
  Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
  thread for each device we have; if the driver is built into the kernel.
  If the driver is built as a module, no defunct threads appear.

 What is the parent PID for the defunct tasks?  zero?

According to ps, 1

[root@stewart-nw34 networking]# ps alexw
  F   UID PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY TIME  COMMAND
044 0  14 1   9   0 00 do_exi Z?  0:00 [eth0 defunct]
044 0  15 1   9   0 00 do_exi Z?  0:00 [eth1 defunct]
044 0  16 1   9   0 00 do_exi Z?  0:00 [eth2 defunct]
040 0 240 1   9   0 00 rtl813 SW   ?  0:00 [eth0]

-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Andrew Morton

Alan Cox wrote:
 
  slaps head swapper doesn't know how to reap children, and
  AFAIK there's no way for a kernel thread to fully clean itself
  up.  This is always done by the parent.
 
 Make daemonize() move threads with parent 0 to parent 1

Reparenting would require diving inside this lot:

/* 
 * pointers to (original) parent process, youngest child, younger sibling,
 * older sibling, respectively.  (p-father can be replaced with 
 * p-p_pptr-pid)
 */
struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;
struct list_head thread_group;

plus maybe rewriting pgrps, sessions, gids, etc.  Challenging.

Plus it would mean that the kernel requires, for its
correct operation, that process "1" is a child reaper.
Is this a good thing?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Andrew Morton

Rod Stewart wrote:
 
 On Thu, 12 Apr 2001, Andrew Morton wrote:
  Rod Stewart wrote:
  
   Hello,
  
   Using the 8139too driver, 0.9.15c, we have noticed that we get a defunct
   thread for each device we have; if the driver is built into the kernel.
   If the driver is built as a module, no defunct threads appear.
 
  What is the parent PID for the defunct tasks?  zero?
 
 According to ps, 1

Ah.  Of course.  All (or most) kernel initialisation is
done by PID 1.  Search for "kernel_thread" in init/main.c

So it seems that in your setup, process 1 is not reaping
children, which is why this hasn't been reported before.
Is there something unusual about your setup?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Alan Cox

 Plus it would mean that the kernel requires, for its
 correct operation, that process "1" is a child reaper.
 Is this a good thing?

That is already required. The rest of the reparenting functionality is also
in kernel/exit.c already

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Andrew Morton

Rod Stewart wrote:
 
 On Thu, 12 Apr 2001, Andrew Morton wrote:
  Is there something unusual about your setup?
 
 One box is standard PIII with RH 7.0, the other is a custom Crusoe TM5400
 board.  But from further investigation it appears to be a kernel config
 option.  As I've got a 2.4.0 kernel which has very little compiled in and
 not showing the problem and another kernel which has many more networking
 options built in and showing the problem.  I've seen this problem
 since 2.4.0.test11.
 

Sorry.  I meant: what is process 1 on this machine?  Is it not
the normal init?  If not, then according to Alan, the fault
lies with your userspace.  Kernel requires that PID 1 reap
children.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread Rod Stewart


On Thu, 12 Apr 2001, Andrew Morton wrote:
 Rod Stewart wrote:
 
  On Thu, 12 Apr 2001, Andrew Morton wrote:
   Is there something unusual about your setup?
 
  One box is standard PIII with RH 7.0, the other is a custom Crusoe TM5400
  board.  But from further investigation it appears to be a kernel config
  option.  As I've got a 2.4.0 kernel which has very little compiled in and
  not showing the problem and another kernel which has many more networking
  options built in and showing the problem.  I've seen this problem
  since 2.4.0.test11.
 

 Sorry.  I meant: what is process 1 on this machine?  Is it not
 the normal init?  If not, then according to Alan, the fault
 lies with your userspace.  Kernel requires that PID 1 reap
 children.

Yes, it is the normal init on both boxes.

-Rms

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 8139too: defunct threads

2001-04-12 Thread David Woodhouse



[EMAIL PROTECTED] said:
  ho-hum.  Jeff, I think the best fix here is to bite the bullet and
 write kernel_daemon(), which will delegate thread creation to keventd,
 which is the only thing we have which reaps zombies.  Any better
 ideas?

Yes. Let init do it, as God intended. Why reap threads in the kernel when 
they could just reparent themselves as children of pid 1?

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/