Re: NULL pointer problem in pid selection ?

2003-03-10 Thread John Baldwin

On 08-Mar-2003 Kris Kennaway wrote:
 On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote:
 
 Just got this crash on -current, and I belive I have seen similar
 before.  addr2line(1) reports the faulting address to be
  ../../../kern/kern_fork.c:395
 which is in the inner loop of pid collision avoidance.
 
 I've been running this patch from Alfred for the past month or so on
 bento, which has fixed a similar panic I was seeing regularly.

Using just a shared lock instead of an xlock should be ok there.  You
aren't modifying the process tree, just looking at it.  OTOH, the
proc lock is supposed to protect p_grp and p_session, so they shouldn't
be NULL. :(

 Kris
 
 Index: kern/kern_fork.c
 ===
 RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v
 retrieving revision 1.186
 diff -u -r1.186 kern_fork.c
 --- kern/kern_fork.c  27 Feb 2003 02:05:17 -  1.186
 +++ kern/kern_fork.c  4 Mar 2003 00:28:09 -
 @@ -325,6 +325,7 @@
* exceed the limit. The variable nprocs is the current number of
* processes, maxproc is the limit.
*/
 + sx_xlock(proctree_lock);
   sx_xlock(allproc_lock);
   uid = td-td_ucred-cr_ruid;
   if ((nprocs = maxproc - 10  uid != 0) || nprocs = maxproc) {
 @@ -432,6 +433,7 @@
   LIST_INSERT_HEAD(allproc, p2, p_list);
   LIST_INSERT_HEAD(PIDHASH(p2-p_pid), p2, p_hash);
   sx_xunlock(allproc_lock);
 + sx_xunlock(proctree_lock);
  
   /*
* Malloc things while we don't hold any locks.
 @@ -757,6 +759,7 @@
   return (0);
  fail:
   sx_xunlock(allproc_lock);
 + sx_xunlock(proctree_lock);
   uma_zfree(proc_zone, newproc);
   if (p1-p_flag  P_THREADED) {
   PROC_LOCK(p1);
 
 
 
 Poul-Henning
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; lapic.id = 
 fault virtual address   = 0x14
 fault code  = supervisor read, page not present
 instruction pointer = 0x8:0xc01c3eec
 stack pointer   = 0x10:0xe74e3c74
 frame pointer   = 0x10:0xe74e3cbc
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, def32 1, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 99777 (sh)
 trap number = 12
 panic: page fault
 cpuid = 0; lapic.id = 
 Stack backtrace:
 backtrace(c032ff8e,0,c03394ce,e74e3b68,1) at 0xc01d86a7 = backtrace+0x17
 panic(c03394ce,c0342131,cfe5496c,1,1) at 0xc01d87ba = panic+0x10a
 trap_fatal(e74e3c34,14,c03422ba,2e3,cfe4fa50) at 0xc02fa672 = trap_fatal+0x322
 trap_pfault(e74e3c34,0,14,c035a038,14) at 0xc02fa322 = trap_pfault+0x1c2
 trap(18,10,10,cf19c3f8,cf76b9ec) at 0xc02f9e9d = trap+0x3cd
 calltrap() at 0xc02e2cd8 = calltrap+0x5
 --- trap 0xc, eip = 0xc01c3eec, esp = 0xe74e3c74, ebp = 0xe74e3cbc ---
 fork1(cfe4fa50,14,0,e74e3cd4,cfe54858) at 0xc01c3eec = fork1+0x3fc
 fork(cfe4fa50,e74e3d10,c03422ba,404,0) at 0xc01c3852 = fork+0x52
 syscall(2f,2f,2f,0,80ff000) at 0xc02fa98e = syscall+0x26e
 Xint0x80_syscall() at 0xc02e2d2d = Xint0x80_syscall+0x1d
 --- syscall (2), eip = 0x807ba9f, esp = 0xbfbff6bc, ebp = 0xbfbff6e8 ---
 boot() called on cpu#0
 
 -- 
 Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
 [EMAIL PROTECTED] | TCP/IP since RFC 956
 FreeBSD committer   | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: NULL pointer problem in pid selection ?

2003-03-10 Thread Tim Robbins
On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote:

 On 08-Mar-2003 Kris Kennaway wrote:
  On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote:
  
  Just got this crash on -current, and I belive I have seen similar
  before.  addr2line(1) reports the faulting address to be
   ../../../kern/kern_fork.c:395
  which is in the inner loop of pid collision avoidance.
  
  I've been running this patch from Alfred for the past month or so on
  bento, which has fixed a similar panic I was seeing regularly.
 
 Using just a shared lock instead of an xlock should be ok there.  You
 aren't modifying the process tree, just looking at it.  OTOH, the
 proc lock is supposed to protect p_grp and p_session, so they shouldn't
 be NULL. :(

I have a suspiscion that the bug is actually in wait1():

sx_xlock(proctree_lock);
[...]
/*
 * Remove other references to this process to ensure
 * we have an exclusive reference.
 */
leavepgrp(p);

sx_xlock(allproc_lock);
LIST_REMOVE(p, p_list); /* off zombproc */
sx_xunlock(allproc_lock);

LIST_REMOVE(p, p_sibling);
sx_xunlock(proctree_lock);


Shouldn't we be removing the process from zombproc before setting
p_pgrp to NULL via leavepgrp()? Does this even matter at all when both
fork1() and wait1() are still protected by Giant?


Tim

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: NULL pointer problem in pid selection ?

2003-03-10 Thread John Baldwin

On 10-Mar-2003 Tim Robbins wrote:
 On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote:
 
 On 08-Mar-2003 Kris Kennaway wrote:
  On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote:
  
  Just got this crash on -current, and I belive I have seen similar
  before.  addr2line(1) reports the faulting address to be
   ../../../kern/kern_fork.c:395
  which is in the inner loop of pid collision avoidance.
  
  I've been running this patch from Alfred for the past month or so on
  bento, which has fixed a similar panic I was seeing regularly.
 
 Using just a shared lock instead of an xlock should be ok there.  You
 aren't modifying the process tree, just looking at it.  OTOH, the
 proc lock is supposed to protect p_grp and p_session, so they shouldn't
 be NULL. :(
 
 I have a suspiscion that the bug is actually in wait1():
 
 sx_xlock(proctree_lock);
   [...]
   /*
* Remove other references to this process to ensure
* we have an exclusive reference.
*/
   leavepgrp(p);
 
   sx_xlock(allproc_lock);
   LIST_REMOVE(p, p_list); /* off zombproc */
   sx_xunlock(allproc_lock);
 
   LIST_REMOVE(p, p_sibling);
   sx_xunlock(proctree_lock);
 
 
 Shouldn't we be removing the process from zombproc before setting
 p_pgrp to NULL via leavepgrp()? Does this even matter at all when both
 fork1() and wait1() are still protected by Giant?

Giant doesn't help you with sleeps.  However, removing the process from
zombproc before destroying it's other linkages might be more correct, yes.

 Tim

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: NULL pointer problem in pid selection ?

2003-03-10 Thread Thomas Moestl
On Tue, 2003/03/11 at 08:43:46 +1100, Tim Robbins wrote:
 On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote:
 
  On 08-Mar-2003 Kris Kennaway wrote:
   On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote:
   
   Just got this crash on -current, and I belive I have seen similar
   before.  addr2line(1) reports the faulting address to be
../../../kern/kern_fork.c:395
   which is in the inner loop of pid collision avoidance.
   
   I've been running this patch from Alfred for the past month or so on
   bento, which has fixed a similar panic I was seeing regularly.
  
  Using just a shared lock instead of an xlock should be ok there.  You
  aren't modifying the process tree, just looking at it.  OTOH, the
  proc lock is supposed to protect p_grp and p_session, so they shouldn't
  be NULL. :(
 
 I have a suspiscion that the bug is actually in wait1():
 
 sx_xlock(proctree_lock);
   [...]
   /*
* Remove other references to this process to ensure
* we have an exclusive reference.
*/
   leavepgrp(p);
 
   sx_xlock(allproc_lock);
   LIST_REMOVE(p, p_list); /* off zombproc */
   sx_xunlock(allproc_lock);
 
   LIST_REMOVE(p, p_sibling);
   sx_xunlock(proctree_lock);
 
 
 Shouldn't we be removing the process from zombproc before setting
 p_pgrp to NULL via leavepgrp()? Does this even matter at all when both
 fork1() and wait1() are still protected by Giant?

Hmmm, I think you're right; if allproc_lock happens to be contested in
fork1() (which can happen because it it is locked without Giant held
in some places, and because sleeping with an sx lock is allowed),
we'll go to sleep there, dropping Giant. This opens up a race, since
wait1() can now proceed until after the leavepgrp() before blocking;
when allproc_lock is released, fork1() will be the first to pick it
up, and this panic will happen.
Seems that I relied on Giant too much when I first took a look into
that code :)

- Thomas

-- 
Thomas Moestl [EMAIL PROTECTED]   http://www.tu-bs.de/~y0015675/
  [EMAIL PROTECTED]   http://people.FreeBSD.org/~tmm/
PGP fingerprint: 1C97 A604 2BD0 E492 51D0  9C0F 1FE6 4F1D 419C 776C

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


NULL pointer problem in pid selection ?

2003-03-08 Thread Poul-Henning Kamp

Just got this crash on -current, and I belive I have seen similar
before.  addr2line(1) reports the faulting address to be
../../../kern/kern_fork.c:395
which is in the inner loop of pid collision avoidance.

Poul-Henning

Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic.id = 
fault virtual address   = 0x14
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc01c3eec
stack pointer   = 0x10:0xe74e3c74
frame pointer   = 0x10:0xe74e3cbc
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 99777 (sh)
trap number = 12
panic: page fault
cpuid = 0; lapic.id = 
Stack backtrace:
backtrace(c032ff8e,0,c03394ce,e74e3b68,1) at 0xc01d86a7 = backtrace+0x17
panic(c03394ce,c0342131,cfe5496c,1,1) at 0xc01d87ba = panic+0x10a
trap_fatal(e74e3c34,14,c03422ba,2e3,cfe4fa50) at 0xc02fa672 = trap_fatal+0x322
trap_pfault(e74e3c34,0,14,c035a038,14) at 0xc02fa322 = trap_pfault+0x1c2
trap(18,10,10,cf19c3f8,cf76b9ec) at 0xc02f9e9d = trap+0x3cd
calltrap() at 0xc02e2cd8 = calltrap+0x5
--- trap 0xc, eip = 0xc01c3eec, esp = 0xe74e3c74, ebp = 0xe74e3cbc ---
fork1(cfe4fa50,14,0,e74e3cd4,cfe54858) at 0xc01c3eec = fork1+0x3fc
fork(cfe4fa50,e74e3d10,c03422ba,404,0) at 0xc01c3852 = fork+0x52
syscall(2f,2f,2f,0,80ff000) at 0xc02fa98e = syscall+0x26e
Xint0x80_syscall() at 0xc02e2d2d = Xint0x80_syscall+0x1d
--- syscall (2), eip = 0x807ba9f, esp = 0xbfbff6bc, ebp = 0xbfbff6e8 ---
boot() called on cpu#0

-- 
Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
[EMAIL PROTECTED] | TCP/IP since RFC 956
FreeBSD committer   | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message


Re: NULL pointer problem in pid selection ?

2003-03-08 Thread Kris Kennaway
On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote:
 
 Just got this crash on -current, and I belive I have seen similar
 before.  addr2line(1) reports the faulting address to be
   ../../../kern/kern_fork.c:395
 which is in the inner loop of pid collision avoidance.

I've been running this patch from Alfred for the past month or so on
bento, which has fixed a similar panic I was seeing regularly.

Kris

Index: kern/kern_fork.c
===
RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.186
diff -u -r1.186 kern_fork.c
--- kern/kern_fork.c27 Feb 2003 02:05:17 -  1.186
+++ kern/kern_fork.c4 Mar 2003 00:28:09 -
@@ -325,6 +325,7 @@
 * exceed the limit. The variable nprocs is the current number of
 * processes, maxproc is the limit.
 */
+   sx_xlock(proctree_lock);
sx_xlock(allproc_lock);
uid = td-td_ucred-cr_ruid;
if ((nprocs = maxproc - 10  uid != 0) || nprocs = maxproc) {
@@ -432,6 +433,7 @@
LIST_INSERT_HEAD(allproc, p2, p_list);
LIST_INSERT_HEAD(PIDHASH(p2-p_pid), p2, p_hash);
sx_xunlock(allproc_lock);
+   sx_xunlock(proctree_lock);
 
/*
 * Malloc things while we don't hold any locks.
@@ -757,6 +759,7 @@
return (0);
 fail:
sx_xunlock(allproc_lock);
+   sx_xunlock(proctree_lock);
uma_zfree(proc_zone, newproc);
if (p1-p_flag  P_THREADED) {
PROC_LOCK(p1);


 
 Poul-Henning
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; lapic.id = 
 fault virtual address   = 0x14
 fault code  = supervisor read, page not present
 instruction pointer = 0x8:0xc01c3eec
 stack pointer   = 0x10:0xe74e3c74
 frame pointer   = 0x10:0xe74e3cbc
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, def32 1, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 99777 (sh)
 trap number = 12
 panic: page fault
 cpuid = 0; lapic.id = 
 Stack backtrace:
 backtrace(c032ff8e,0,c03394ce,e74e3b68,1) at 0xc01d86a7 = backtrace+0x17
 panic(c03394ce,c0342131,cfe5496c,1,1) at 0xc01d87ba = panic+0x10a
 trap_fatal(e74e3c34,14,c03422ba,2e3,cfe4fa50) at 0xc02fa672 = trap_fatal+0x322
 trap_pfault(e74e3c34,0,14,c035a038,14) at 0xc02fa322 = trap_pfault+0x1c2
 trap(18,10,10,cf19c3f8,cf76b9ec) at 0xc02f9e9d = trap+0x3cd
 calltrap() at 0xc02e2cd8 = calltrap+0x5
 --- trap 0xc, eip = 0xc01c3eec, esp = 0xe74e3c74, ebp = 0xe74e3cbc ---
 fork1(cfe4fa50,14,0,e74e3cd4,cfe54858) at 0xc01c3eec = fork1+0x3fc
 fork(cfe4fa50,e74e3d10,c03422ba,404,0) at 0xc01c3852 = fork+0x52
 syscall(2f,2f,2f,0,80ff000) at 0xc02fa98e = syscall+0x26e
 Xint0x80_syscall() at 0xc02e2d2d = Xint0x80_syscall+0x1d
 --- syscall (2), eip = 0x807ba9f, esp = 0xbfbff6bc, ebp = 0xbfbff6e8 ---
 boot() called on cpu#0
 
 -- 
 Poul-Henning Kamp   | UNIX since Zilog Zeus 3.20
 [EMAIL PROTECTED] | TCP/IP since RFC 956
 FreeBSD committer   | BSD since 4.3-tahoe
 Never attribute to malice what can adequately be explained by incompetence.
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message


pgp0.pgp
Description: PGP signature