Re: NULL pointer problem in pid selection ?
On 08-Mar-2003 Kris Kennaway wrote: On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote: Just got this crash on -current, and I belive I have seen similar before. addr2line(1) reports the faulting address to be ../../../kern/kern_fork.c:395 which is in the inner loop of pid collision avoidance. I've been running this patch from Alfred for the past month or so on bento, which has fixed a similar panic I was seeing regularly. Using just a shared lock instead of an xlock should be ok there. You aren't modifying the process tree, just looking at it. OTOH, the proc lock is supposed to protect p_grp and p_session, so they shouldn't be NULL. :( Kris Index: kern/kern_fork.c === RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v retrieving revision 1.186 diff -u -r1.186 kern_fork.c --- kern/kern_fork.c 27 Feb 2003 02:05:17 - 1.186 +++ kern/kern_fork.c 4 Mar 2003 00:28:09 - @@ -325,6 +325,7 @@ * exceed the limit. The variable nprocs is the current number of * processes, maxproc is the limit. */ + sx_xlock(proctree_lock); sx_xlock(allproc_lock); uid = td-td_ucred-cr_ruid; if ((nprocs = maxproc - 10 uid != 0) || nprocs = maxproc) { @@ -432,6 +433,7 @@ LIST_INSERT_HEAD(allproc, p2, p_list); LIST_INSERT_HEAD(PIDHASH(p2-p_pid), p2, p_hash); sx_xunlock(allproc_lock); + sx_xunlock(proctree_lock); /* * Malloc things while we don't hold any locks. @@ -757,6 +759,7 @@ return (0); fail: sx_xunlock(allproc_lock); + sx_xunlock(proctree_lock); uma_zfree(proc_zone, newproc); if (p1-p_flag P_THREADED) { PROC_LOCK(p1); Poul-Henning Fatal trap 12: page fault while in kernel mode cpuid = 0; lapic.id = fault virtual address = 0x14 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01c3eec stack pointer = 0x10:0xe74e3c74 frame pointer = 0x10:0xe74e3cbc code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 99777 (sh) trap number = 12 panic: page fault cpuid = 0; lapic.id = Stack backtrace: backtrace(c032ff8e,0,c03394ce,e74e3b68,1) at 0xc01d86a7 = backtrace+0x17 panic(c03394ce,c0342131,cfe5496c,1,1) at 0xc01d87ba = panic+0x10a trap_fatal(e74e3c34,14,c03422ba,2e3,cfe4fa50) at 0xc02fa672 = trap_fatal+0x322 trap_pfault(e74e3c34,0,14,c035a038,14) at 0xc02fa322 = trap_pfault+0x1c2 trap(18,10,10,cf19c3f8,cf76b9ec) at 0xc02f9e9d = trap+0x3cd calltrap() at 0xc02e2cd8 = calltrap+0x5 --- trap 0xc, eip = 0xc01c3eec, esp = 0xe74e3c74, ebp = 0xe74e3cbc --- fork1(cfe4fa50,14,0,e74e3cd4,cfe54858) at 0xc01c3eec = fork1+0x3fc fork(cfe4fa50,e74e3d10,c03422ba,404,0) at 0xc01c3852 = fork+0x52 syscall(2f,2f,2f,0,80ff000) at 0xc02fa98e = syscall+0x26e Xint0x80_syscall() at 0xc02e2d2d = Xint0x80_syscall+0x1d --- syscall (2), eip = 0x807ba9f, esp = 0xbfbff6bc, ebp = 0xbfbff6e8 --- boot() called on cpu#0 -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NULL pointer problem in pid selection ?
On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote: On 08-Mar-2003 Kris Kennaway wrote: On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote: Just got this crash on -current, and I belive I have seen similar before. addr2line(1) reports the faulting address to be ../../../kern/kern_fork.c:395 which is in the inner loop of pid collision avoidance. I've been running this patch from Alfred for the past month or so on bento, which has fixed a similar panic I was seeing regularly. Using just a shared lock instead of an xlock should be ok there. You aren't modifying the process tree, just looking at it. OTOH, the proc lock is supposed to protect p_grp and p_session, so they shouldn't be NULL. :( I have a suspiscion that the bug is actually in wait1(): sx_xlock(proctree_lock); [...] /* * Remove other references to this process to ensure * we have an exclusive reference. */ leavepgrp(p); sx_xlock(allproc_lock); LIST_REMOVE(p, p_list); /* off zombproc */ sx_xunlock(allproc_lock); LIST_REMOVE(p, p_sibling); sx_xunlock(proctree_lock); Shouldn't we be removing the process from zombproc before setting p_pgrp to NULL via leavepgrp()? Does this even matter at all when both fork1() and wait1() are still protected by Giant? Tim To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NULL pointer problem in pid selection ?
On 10-Mar-2003 Tim Robbins wrote: On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote: On 08-Mar-2003 Kris Kennaway wrote: On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote: Just got this crash on -current, and I belive I have seen similar before. addr2line(1) reports the faulting address to be ../../../kern/kern_fork.c:395 which is in the inner loop of pid collision avoidance. I've been running this patch from Alfred for the past month or so on bento, which has fixed a similar panic I was seeing regularly. Using just a shared lock instead of an xlock should be ok there. You aren't modifying the process tree, just looking at it. OTOH, the proc lock is supposed to protect p_grp and p_session, so they shouldn't be NULL. :( I have a suspiscion that the bug is actually in wait1(): sx_xlock(proctree_lock); [...] /* * Remove other references to this process to ensure * we have an exclusive reference. */ leavepgrp(p); sx_xlock(allproc_lock); LIST_REMOVE(p, p_list); /* off zombproc */ sx_xunlock(allproc_lock); LIST_REMOVE(p, p_sibling); sx_xunlock(proctree_lock); Shouldn't we be removing the process from zombproc before setting p_pgrp to NULL via leavepgrp()? Does this even matter at all when both fork1() and wait1() are still protected by Giant? Giant doesn't help you with sleeps. However, removing the process from zombproc before destroying it's other linkages might be more correct, yes. Tim -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NULL pointer problem in pid selection ?
On Tue, 2003/03/11 at 08:43:46 +1100, Tim Robbins wrote: On Mon, Mar 10, 2003 at 01:00:15PM -0500, John Baldwin wrote: On 08-Mar-2003 Kris Kennaway wrote: On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote: Just got this crash on -current, and I belive I have seen similar before. addr2line(1) reports the faulting address to be ../../../kern/kern_fork.c:395 which is in the inner loop of pid collision avoidance. I've been running this patch from Alfred for the past month or so on bento, which has fixed a similar panic I was seeing regularly. Using just a shared lock instead of an xlock should be ok there. You aren't modifying the process tree, just looking at it. OTOH, the proc lock is supposed to protect p_grp and p_session, so they shouldn't be NULL. :( I have a suspiscion that the bug is actually in wait1(): sx_xlock(proctree_lock); [...] /* * Remove other references to this process to ensure * we have an exclusive reference. */ leavepgrp(p); sx_xlock(allproc_lock); LIST_REMOVE(p, p_list); /* off zombproc */ sx_xunlock(allproc_lock); LIST_REMOVE(p, p_sibling); sx_xunlock(proctree_lock); Shouldn't we be removing the process from zombproc before setting p_pgrp to NULL via leavepgrp()? Does this even matter at all when both fork1() and wait1() are still protected by Giant? Hmmm, I think you're right; if allproc_lock happens to be contested in fork1() (which can happen because it it is locked without Giant held in some places, and because sleeping with an sx lock is allowed), we'll go to sleep there, dropping Giant. This opens up a race, since wait1() can now proceed until after the leavepgrp() before blocking; when allproc_lock is released, fork1() will be the first to pick it up, and this panic will happen. Seems that I relied on Giant too much when I first took a look into that code :) - Thomas -- Thomas Moestl [EMAIL PROTECTED] http://www.tu-bs.de/~y0015675/ [EMAIL PROTECTED] http://people.FreeBSD.org/~tmm/ PGP fingerprint: 1C97 A604 2BD0 E492 51D0 9C0F 1FE6 4F1D 419C 776C To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
NULL pointer problem in pid selection ?
Just got this crash on -current, and I belive I have seen similar before. addr2line(1) reports the faulting address to be ../../../kern/kern_fork.c:395 which is in the inner loop of pid collision avoidance. Poul-Henning Fatal trap 12: page fault while in kernel mode cpuid = 0; lapic.id = fault virtual address = 0x14 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01c3eec stack pointer = 0x10:0xe74e3c74 frame pointer = 0x10:0xe74e3cbc code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 99777 (sh) trap number = 12 panic: page fault cpuid = 0; lapic.id = Stack backtrace: backtrace(c032ff8e,0,c03394ce,e74e3b68,1) at 0xc01d86a7 = backtrace+0x17 panic(c03394ce,c0342131,cfe5496c,1,1) at 0xc01d87ba = panic+0x10a trap_fatal(e74e3c34,14,c03422ba,2e3,cfe4fa50) at 0xc02fa672 = trap_fatal+0x322 trap_pfault(e74e3c34,0,14,c035a038,14) at 0xc02fa322 = trap_pfault+0x1c2 trap(18,10,10,cf19c3f8,cf76b9ec) at 0xc02f9e9d = trap+0x3cd calltrap() at 0xc02e2cd8 = calltrap+0x5 --- trap 0xc, eip = 0xc01c3eec, esp = 0xe74e3c74, ebp = 0xe74e3cbc --- fork1(cfe4fa50,14,0,e74e3cd4,cfe54858) at 0xc01c3eec = fork1+0x3fc fork(cfe4fa50,e74e3d10,c03422ba,404,0) at 0xc01c3852 = fork+0x52 syscall(2f,2f,2f,0,80ff000) at 0xc02fa98e = syscall+0x26e Xint0x80_syscall() at 0xc02e2d2d = Xint0x80_syscall+0x1d --- syscall (2), eip = 0x807ba9f, esp = 0xbfbff6bc, ebp = 0xbfbff6e8 --- boot() called on cpu#0 -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NULL pointer problem in pid selection ?
On Sat, Mar 08, 2003 at 11:46:34AM +0100, Poul-Henning Kamp wrote: Just got this crash on -current, and I belive I have seen similar before. addr2line(1) reports the faulting address to be ../../../kern/kern_fork.c:395 which is in the inner loop of pid collision avoidance. I've been running this patch from Alfred for the past month or so on bento, which has fixed a similar panic I was seeing regularly. Kris Index: kern/kern_fork.c === RCS file: /home/ncvs/src/sys/kern/kern_fork.c,v retrieving revision 1.186 diff -u -r1.186 kern_fork.c --- kern/kern_fork.c27 Feb 2003 02:05:17 - 1.186 +++ kern/kern_fork.c4 Mar 2003 00:28:09 - @@ -325,6 +325,7 @@ * exceed the limit. The variable nprocs is the current number of * processes, maxproc is the limit. */ + sx_xlock(proctree_lock); sx_xlock(allproc_lock); uid = td-td_ucred-cr_ruid; if ((nprocs = maxproc - 10 uid != 0) || nprocs = maxproc) { @@ -432,6 +433,7 @@ LIST_INSERT_HEAD(allproc, p2, p_list); LIST_INSERT_HEAD(PIDHASH(p2-p_pid), p2, p_hash); sx_xunlock(allproc_lock); + sx_xunlock(proctree_lock); /* * Malloc things while we don't hold any locks. @@ -757,6 +759,7 @@ return (0); fail: sx_xunlock(allproc_lock); + sx_xunlock(proctree_lock); uma_zfree(proc_zone, newproc); if (p1-p_flag P_THREADED) { PROC_LOCK(p1); Poul-Henning Fatal trap 12: page fault while in kernel mode cpuid = 0; lapic.id = fault virtual address = 0x14 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01c3eec stack pointer = 0x10:0xe74e3c74 frame pointer = 0x10:0xe74e3cbc code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 99777 (sh) trap number = 12 panic: page fault cpuid = 0; lapic.id = Stack backtrace: backtrace(c032ff8e,0,c03394ce,e74e3b68,1) at 0xc01d86a7 = backtrace+0x17 panic(c03394ce,c0342131,cfe5496c,1,1) at 0xc01d87ba = panic+0x10a trap_fatal(e74e3c34,14,c03422ba,2e3,cfe4fa50) at 0xc02fa672 = trap_fatal+0x322 trap_pfault(e74e3c34,0,14,c035a038,14) at 0xc02fa322 = trap_pfault+0x1c2 trap(18,10,10,cf19c3f8,cf76b9ec) at 0xc02f9e9d = trap+0x3cd calltrap() at 0xc02e2cd8 = calltrap+0x5 --- trap 0xc, eip = 0xc01c3eec, esp = 0xe74e3c74, ebp = 0xe74e3cbc --- fork1(cfe4fa50,14,0,e74e3cd4,cfe54858) at 0xc01c3eec = fork1+0x3fc fork(cfe4fa50,e74e3d10,c03422ba,404,0) at 0xc01c3852 = fork+0x52 syscall(2f,2f,2f,0,80ff000) at 0xc02fa98e = syscall+0x26e Xint0x80_syscall() at 0xc02e2d2d = Xint0x80_syscall+0x1d --- syscall (2), eip = 0x807ba9f, esp = 0xbfbff6bc, ebp = 0xbfbff6e8 --- boot() called on cpu#0 -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message pgp0.pgp Description: PGP signature