Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin wrote: > On 22-Feb-01 Maxim Sobolev wrote: > > John Baldwin wrote: > >> > >> A recursive sched_lock? Erm, well, stick these options in your kernel > >> config: > >> > >> options KTR > >> options KTR_EXTEND > >> options KTR_COMPILE=KTR_LOCK > >> options KTR_MASK=KTR_MASK > > > > Bah, it even doesn't compile with these options: > > cc -c -pipe -O -march=pentium -Wall -Wredundant-decls -Wnested-externs > > -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline > > -Wcast-qual > > -fformat-extensions -ansi -nostdinc -I- -I. -I../.. -I../../dev > > -I../../../include -I../../contrib/dev/acpica/Subsystem/Include -D_KERNEL > > -include > > opt_global.h -elf -mpreferred-stack-boundary=2 ../../kern/kern_ktr.c > > ../../kern/kern_ktr.c: In function `__Tunable_ktr_mask': > > ../../kern/kern_ktr.c:95: `KTR_MASK' undeclared (first use in this function) > > ../../kern/kern_ktr.c:95: (Each undeclared identifier is reported only once > > ../../kern/kern_ktr.c:95: for each function it appears in.) > > *** Error code 1 > > 1 error > > Oh, whoops, that should be: > > options KTR_MASK=KTR_LOCK Update: I'm still unable to boot kernel on my machine even into single user. Following is backtrace from ddb (after commenting out enable_intr() in trap.c::trace() as usually): Fatal trap 9: general protection fault while in kernel mode instruction pointer = 0x8:0xc0265e36 stask pointer = 0x10:0xc3577f50 frame pointer = 0x10:0xc3577f64 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor flags = resume, IOPL = 0 current process = 16 (irq14: ata0) kernel: type 9 trap, code=0 Stopped at sw1b+0x7c: ltr %si db> trace sw1b(c0147c74, c0147c74, 0, c32c1da0, c3577f94) at sw1b+0x7c ithread_loop(c0741c00, c3577fa8) at ithread_loop+0x67b fork_exit(c0147c74, c0741c00, c3577fa8) at fork_exit+0xd6 fork_trampoline() at fork_trampoline+0x8 -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin wrote: > On 22-Feb-01 Maxim Sobolev wrote: > > John Baldwin wrote: > > > >> On 22-Feb-01 Maxim Sobolev wrote: > >> > >> >> > Here it is (from DDB): > >> >> > panic(c027de93,c0297409,c027f878,368,80286) > >> >> > _mtx_assert(c02ea000,9,c027f878,368,80286) > >> >> > mi_switch(c32c5da0,3,c02cea44,c357be98) > >> >> > ithread_schedule(c0747c00,1) > >> >> > sched_ithd(e) > >> >> > Xresume14() > >> >> > --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- > >> >> > trap(18, 10, 10,c01597b6,20) > >> >> > calltrap() > >> >> > --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- > >> >> > sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) > >> >> > ithread_loop(c0747c00,c357bfa8) > >> >> > fork_exit(c0146cbc,c0747c00,c357bfa8) > >> >> > fork_trampoline() > >> >> > >> >> *sigh* This is why enabling interrupts in trap() is such a bad idea. If > >> >> we > >> >> get a trap in the scheduler, then lots of bad crap starts to happen > >> >> because > >> >> we > >> >> can get an interrupt while we are in a trap. :( Can you compile your > >> >> kernel > >> >> with > >> >> INVARIANTS on though, as I think the kernel should've panic'd earlier if > >> >> it > >> >> is > >> >> doing what I think it is doing. > >> > > >> > It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. > >> > >> Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl. > > > > It doesn't really matter, because system can't even boot into single-user due > > to > > panic. > > > >> >> Also, if you are feeling industrious, edit > >> >> sys/i386/i386/trap.c and comment out the enable_intr() call near the > >> >> beginning > >> >> of the trap() function right after the printf for 'kernel trap %d with > >> >> interrupts disabled'. > >> > > >> > Ok, I'll try so. > >> > > >> > -Maxim > >> > >> It will still panic, just hopefully a better panic. > > > > I did understand that, but the panic I see after the change is exactly the > > same as > > before. Any other ideas? > > A recursive sched_lock? Erm, well, stick these options in your kernel config: > > options KTR > options KTR_EXTEND > options KTR_COMPILE=KTR_LOCK > options KTR_MASK=KTR_MASK > > Then when it panics, use the 'show ktr' command to list the mutex operations up > until that point. Hopefully you can see where it is grabbing sched lock the > first time and then not releasing it. Got the following: 724: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 723: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 722: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 721: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 680: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 679: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 569: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 568: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 546: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 545: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 544: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 543: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 515: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 366: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 365: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 317: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 254: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 253: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 252: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 251: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 194: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 193: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 182: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 181: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 46: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 1020: REL (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:438 1019: GOT (spin) sched lock [0xc030c820] r=0 at ../../kern/kern_clock.c:350 -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
On 22-Feb-01 Maxim Sobolev wrote: > John Baldwin wrote: >> >> A recursive sched_lock? Erm, well, stick these options in your kernel >> config: >> >> options KTR >> options KTR_EXTEND >> options KTR_COMPILE=KTR_LOCK >> options KTR_MASK=KTR_MASK > > Bah, it even doesn't compile with these options: > cc -c -pipe -O -march=pentium -Wall -Wredundant-decls -Wnested-externs > -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline > -Wcast-qual > -fformat-extensions -ansi -nostdinc -I- -I. -I../.. -I../../dev > -I../../../include -I../../contrib/dev/acpica/Subsystem/Include -D_KERNEL > -include > opt_global.h -elf -mpreferred-stack-boundary=2 ../../kern/kern_ktr.c > ../../kern/kern_ktr.c: In function `__Tunable_ktr_mask': > ../../kern/kern_ktr.c:95: `KTR_MASK' undeclared (first use in this function) > ../../kern/kern_ktr.c:95: (Each undeclared identifier is reported only once > ../../kern/kern_ktr.c:95: for each function it appears in.) > *** Error code 1 > 1 error Oh, whoops, that should be: options KTR_MASK=KTR_LOCK > -Maxim -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin wrote: > On 22-Feb-01 Maxim Sobolev wrote: > > John Baldwin wrote: > > > >> On 22-Feb-01 Maxim Sobolev wrote: > >> > >> >> > Here it is (from DDB): > >> >> > panic(c027de93,c0297409,c027f878,368,80286) > >> >> > _mtx_assert(c02ea000,9,c027f878,368,80286) > >> >> > mi_switch(c32c5da0,3,c02cea44,c357be98) > >> >> > ithread_schedule(c0747c00,1) > >> >> > sched_ithd(e) > >> >> > Xresume14() > >> >> > --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- > >> >> > trap(18, 10, 10,c01597b6,20) > >> >> > calltrap() > >> >> > --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- > >> >> > sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) > >> >> > ithread_loop(c0747c00,c357bfa8) > >> >> > fork_exit(c0146cbc,c0747c00,c357bfa8) > >> >> > fork_trampoline() > >> >> > >> >> *sigh* This is why enabling interrupts in trap() is such a bad idea. If > >> >> we > >> >> get a trap in the scheduler, then lots of bad crap starts to happen > >> >> because > >> >> we > >> >> can get an interrupt while we are in a trap. :( Can you compile your > >> >> kernel > >> >> with > >> >> INVARIANTS on though, as I think the kernel should've panic'd earlier if > >> >> it > >> >> is > >> >> doing what I think it is doing. > >> > > >> > It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. > >> > >> Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl. > > > > It doesn't really matter, because system can't even boot into single-user due > > to > > panic. > > > >> >> Also, if you are feeling industrious, edit > >> >> sys/i386/i386/trap.c and comment out the enable_intr() call near the > >> >> beginning > >> >> of the trap() function right after the printf for 'kernel trap %d with > >> >> interrupts disabled'. > >> > > >> > Ok, I'll try so. > >> > > >> > -Maxim > >> > >> It will still panic, just hopefully a better panic. > > > > I did understand that, but the panic I see after the change is exactly the > > same as > > before. Any other ideas? > > A recursive sched_lock? Erm, well, stick these options in your kernel config: > > options KTR > options KTR_EXTEND > options KTR_COMPILE=KTR_LOCK > options KTR_MASK=KTR_MASK Bah, it even doesn't compile with these options: cc -c -pipe -O -march=pentium -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -ansi -nostdinc -I- -I. -I../.. -I../../dev -I../../../include -I../../contrib/dev/acpica/Subsystem/Include -D_KERNEL -include opt_global.h -elf -mpreferred-stack-boundary=2 ../../kern/kern_ktr.c ../../kern/kern_ktr.c: In function `__Tunable_ktr_mask': ../../kern/kern_ktr.c:95: `KTR_MASK' undeclared (first use in this function) ../../kern/kern_ktr.c:95: (Each undeclared identifier is reported only once ../../kern/kern_ktr.c:95: for each function it appears in.) *** Error code 1 1 error -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin wrote: > On 22-Feb-01 Maxim Sobolev wrote: > > John Baldwin wrote: > > > >> On 22-Feb-01 Maxim Sobolev wrote: > >> > >> >> > Here it is (from DDB): > >> >> > panic(c027de93,c0297409,c027f878,368,80286) > >> >> > _mtx_assert(c02ea000,9,c027f878,368,80286) > >> >> > mi_switch(c32c5da0,3,c02cea44,c357be98) > >> >> > ithread_schedule(c0747c00,1) > >> >> > sched_ithd(e) > >> >> > Xresume14() > >> >> > --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- > >> >> > trap(18, 10, 10,c01597b6,20) > >> >> > calltrap() > >> >> > --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- > >> >> > sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) > >> >> > ithread_loop(c0747c00,c357bfa8) > >> >> > fork_exit(c0146cbc,c0747c00,c357bfa8) > >> >> > fork_trampoline() > >> >> > >> >> *sigh* This is why enabling interrupts in trap() is such a bad idea. If > >> >> we > >> >> get a trap in the scheduler, then lots of bad crap starts to happen > >> >> because > >> >> we > >> >> can get an interrupt while we are in a trap. :( Can you compile your > >> >> kernel > >> >> with > >> >> INVARIANTS on though, as I think the kernel should've panic'd earlier if > >> >> it > >> >> is > >> >> doing what I think it is doing. > >> > > >> > It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. > >> > >> Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl. > > > > It doesn't really matter, because system can't even boot into single-user due > > to > > panic. > > > >> >> Also, if you are feeling industrious, edit > >> >> sys/i386/i386/trap.c and comment out the enable_intr() call near the > >> >> beginning > >> >> of the trap() function right after the printf for 'kernel trap %d with > >> >> interrupts disabled'. > >> > > >> > Ok, I'll try so. > >> > > >> > -Maxim > >> > >> It will still panic, just hopefully a better panic. > > > > I did understand that, but the panic I see after the change is exactly the > > same as > > before. Any other ideas? > > A recursive sched_lock? Erm, well, stick these options in your kernel config: > > options KTR > options KTR_EXTEND > options KTR_COMPILE=KTR_LOCK > options KTR_MASK=KTR_MASK > > Then when it panics, use the 'show ktr' command to list the mutex operations up > until that point. Hopefully you can see where it is grabbing sched lock the > first time and then not releasing it. Ok, I'll do it and send results later. > Also, hsa the backtrace changed at all? > If not, then you may have commented out the wrong enable_intr(). :) Did what you have suggested. Please see attached diff. -Maxim --- src/sys/i386/i386/trap.c2001/02/22 16:20:12 1.1 +++ src/sys/i386/i386/trap.c2001/02/22 16:20:58 @@ -264,7 +264,7 @@ * We should walk p_heldmtx here and see if any are * spin mutexes, and not do this if so. */ - enable_intr(); +/* enable_intr();*/ } }
Re: A possible bug in the interrupt thread preemption code [Was:
On 22-Feb-01 Maxim Sobolev wrote: > John Baldwin wrote: > >> On 22-Feb-01 Maxim Sobolev wrote: >> >> >> > Here it is (from DDB): >> >> > panic(c027de93,c0297409,c027f878,368,80286) >> >> > _mtx_assert(c02ea000,9,c027f878,368,80286) >> >> > mi_switch(c32c5da0,3,c02cea44,c357be98) >> >> > ithread_schedule(c0747c00,1) >> >> > sched_ithd(e) >> >> > Xresume14() >> >> > --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- >> >> > trap(18, 10, 10,c01597b6,20) >> >> > calltrap() >> >> > --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- >> >> > sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) >> >> > ithread_loop(c0747c00,c357bfa8) >> >> > fork_exit(c0146cbc,c0747c00,c357bfa8) >> >> > fork_trampoline() >> >> >> >> *sigh* This is why enabling interrupts in trap() is such a bad idea. If >> >> we >> >> get a trap in the scheduler, then lots of bad crap starts to happen >> >> because >> >> we >> >> can get an interrupt while we are in a trap. :( Can you compile your >> >> kernel >> >> with >> >> INVARIANTS on though, as I think the kernel should've panic'd earlier if >> >> it >> >> is >> >> doing what I think it is doing. >> > >> > It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. >> >> Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl. > > It doesn't really matter, because system can't even boot into single-user due > to > panic. > >> >> Also, if you are feeling industrious, edit >> >> sys/i386/i386/trap.c and comment out the enable_intr() call near the >> >> beginning >> >> of the trap() function right after the printf for 'kernel trap %d with >> >> interrupts disabled'. >> > >> > Ok, I'll try so. >> > >> > -Maxim >> >> It will still panic, just hopefully a better panic. > > I did understand that, but the panic I see after the change is exactly the > same as > before. Any other ideas? A recursive sched_lock? Erm, well, stick these options in your kernel config: options KTR options KTR_EXTEND options KTR_COMPILE=KTR_LOCK options KTR_MASK=KTR_MASK Then when it panics, use the 'show ktr' command to list the mutex operations up until that point. Hopefully you can see where it is grabbing sched lock the first time and then not releasing it. Also, hsa the backtrace changed at all? If not, then you may have commented out the wrong enable_intr(). :) > -Maxim -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin wrote: > On 22-Feb-01 Maxim Sobolev wrote: > > >> > Here it is (from DDB): > >> > panic(c027de93,c0297409,c027f878,368,80286) > >> > _mtx_assert(c02ea000,9,c027f878,368,80286) > >> > mi_switch(c32c5da0,3,c02cea44,c357be98) > >> > ithread_schedule(c0747c00,1) > >> > sched_ithd(e) > >> > Xresume14() > >> > --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- > >> > trap(18, 10, 10,c01597b6,20) > >> > calltrap() > >> > --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- > >> > sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) > >> > ithread_loop(c0747c00,c357bfa8) > >> > fork_exit(c0146cbc,c0747c00,c357bfa8) > >> > fork_trampoline() > >> > >> *sigh* This is why enabling interrupts in trap() is such a bad idea. If we > >> get a trap in the scheduler, then lots of bad crap starts to happen because > >> we > >> can get an interrupt while we are in a trap. :( Can you compile your kernel > >> with > >> INVARIANTS on though, as I think the kernel should've panic'd earlier if it > >> is > >> doing what I think it is doing. > > > > It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. > > Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl. It doesn't really matter, because system can't even boot into single-user due to panic. > >> Also, if you are feeling industrious, edit > >> sys/i386/i386/trap.c and comment out the enable_intr() call near the > >> beginning > >> of the trap() function right after the printf for 'kernel trap %d with > >> interrupts disabled'. > > > > Ok, I'll try so. > > > > -Maxim > > It will still panic, just hopefully a better panic. I did understand that, but the panic I see after the change is exactly the same as before. Any other ideas? -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
On 22-Feb-01 Dag-Erling Smorgrav wrote: > John Baldwin <[EMAIL PROTECTED]> writes: >> On 22-Feb-01 Maxim Sobolev wrote: >> > It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. >> Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl. > > For the same reason, you probably want WITNESS_SKIPSPIN. Not really. WITNESS doesn't really bog down spin mutexes all that much. It has a very simple order checking that is nothing like the order checking for sleep mutexes. The killer for MUTEX_DEBUG is that each mtx_init() involves walking a linked list of _all_ of the mutexes in the system and checking each one with the one beign init'd to check for a duplicate init. > WITNESS_DDB is a bad idea, BTW, there's a (presumably harmless) lock > order reversal in the FS code that you're practically guaranteed to to > hit during boot. Well, they aren't necessarily harmless, but they've been around for a very long time, so if they do cause rare lockups, they are rare at least. > DES > -- > Dag-Erling Smorgrav - [EMAIL PROTECTED] -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin <[EMAIL PROTECTED]> writes: > On 22-Feb-01 Maxim Sobolev wrote: > > It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. > Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl. For the same reason, you probably want WITNESS_SKIPSPIN. WITNESS_DDB is a bad idea, BTW, there's a (presumably harmless) lock order reversal in the FS code that you're practically guaranteed to to hit during boot. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
On 22-Feb-01 Maxim Sobolev wrote: >> > Here it is (from DDB): >> > panic(c027de93,c0297409,c027f878,368,80286) >> > _mtx_assert(c02ea000,9,c027f878,368,80286) >> > mi_switch(c32c5da0,3,c02cea44,c357be98) >> > ithread_schedule(c0747c00,1) >> > sched_ithd(e) >> > Xresume14() >> > --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- >> > trap(18, 10, 10,c01597b6,20) >> > calltrap() >> > --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- >> > sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) >> > ithread_loop(c0747c00,c357bfa8) >> > fork_exit(c0146cbc,c0747c00,c357bfa8) >> > fork_trampoline() >> >> *sigh* This is why enabling interrupts in trap() is such a bad idea. If we >> get a trap in the scheduler, then lots of bad crap starts to happen because >> we >> can get an interrupt while we are in a trap. :( Can you compile your kernel >> with >> INVARIANTS on though, as I think the kernel should've panic'd earlier if it >> is >> doing what I think it is doing. > > It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. Hmm, ouch, you do'nt want MUTEX_DEBUG, that'll slow your system to a crawl. >> Also, if you are feeling industrious, edit >> sys/i386/i386/trap.c and comment out the enable_intr() call near the >> beginning >> of the trap() function right after the printf for 'kernel trap %d with >> interrupts disabled'. > > Ok, I'll try so. > > -Maxim It will still panic, just hopefully a better panic. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin wrote: > On 22-Feb-01 Maxim Sobolev wrote: > > John Baldwin wrote: > > > >> On 22-Feb-01 Maxim Sobolev wrote: > >> > Dag-Erling Smorgrav wrote: > >> > > >> >> Maxim Sobolev <[EMAIL PROTECTED]> writes: > >> >> > It's not an ata specific problem, but rather a problem of all ISA > >> >> > devices (I have an ISA based ata controller). > >> >> > >> >> I don't think it has anything to do with ISA. I've had similar > >> >> problems on a PCI-only system (actually, PCI+EISA motherboard with no > >> >> EISA cards) with no ATA devices (disks, CD-ROM and streamer are all > >> >> SCSI). > >> >> > >> >> Considering that backing out rev 1.14 of ithread.c eliminates the > >> >> panics, and that that revision is supposed to enable interrupt thread > >> >> preemption, and that the crashed kernels show signs of stack smashing, > >> >> I'd say the cause is probably a bug in the preemption code. > >> > > >> > Update: the bug is still here, as of -current from 22 Feb. Hovewer, this > >> > time > >> > it even doesn't let to boot into single-user with following panic message: > >> > kernel trap 12 with interrupts disabled > >> > panic: mutex sched lock recursed at ../../kern/kern_synch.c:872 > >> > >> E. That would be something that is leaking sched_lock. Hmm... > >> > >> Got a backtrace? What is really annoying is that preemption has been in the > >> kernel since Feb 1. I just accidentally turned it off in the ithread code > >> reorganization and then turned it back on. It was off for a few hours after > >> only being on for 2 weeks, and now everyone magically has problems. > > > > Here it is (from DDB): > > panic(c027de93,c0297409,c027f878,368,80286) > > _mtx_assert(c02ea000,9,c027f878,368,80286) > > mi_switch(c32c5da0,3,c02cea44,c357be98) > > ithread_schedule(c0747c00,1) > > sched_ithd(e) > > Xresume14() > > --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- > > trap(18, 10, 10,c01597b6,20) > > calltrap() > > --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- > > sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) > > ithread_loop(c0747c00,c357bfa8) > > fork_exit(c0146cbc,c0747c00,c357bfa8) > > fork_trampoline() > > *sigh* This is why enabling interrupts in trap() is such a bad idea. If we > get a trap in the scheduler, then lots of bad crap starts to happen because we > can get an interrupt while we are in a trap. :( Can you compile your kernel with > INVARIANTS on though, as I think the kernel should've panic'd earlier if it is > doing what I think it is doing. It's already have INVARIANTS, MUTEX_DEBUG, WITNESS and WITNESS_DDB. > Also, if you are feeling industrious, edit > sys/i386/i386/trap.c and comment out the enable_intr() call near the beginning > of the trap() function right after the printf for 'kernel trap %d with > interrupts disabled'. Ok, I'll try so. -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
On 22-Feb-01 Dag-Erling Smorgrav wrote: > John Baldwin <[EMAIL PROTECTED]> writes: >> On 22-Feb-01 Maxim Sobolev wrote: >> > Dag-Erling Smorgrav wrote: >> > >> >> Maxim Sobolev <[EMAIL PROTECTED]> writes: >> >> > It's not an ata specific problem, but rather a problem of all ISA >> >> > devices (I have an ISA based ata controller). >> >> >> >> I don't think it has anything to do with ISA. I've had similar >> >> problems on a PCI-only system (actually, PCI+EISA motherboard with no >> >> EISA cards) with no ATA devices (disks, CD-ROM and streamer are all >> >> SCSI). >> >> >> >> Considering that backing out rev 1.14 of ithread.c eliminates the >> >> panics, and that that revision is supposed to enable interrupt thread >> >> preemption, and that the crashed kernels show signs of stack smashing, >> >> I'd say the cause is probably a bug in the preemption code. >> > >> > Update: the bug is still here, as of -current from 22 Feb. Hovewer, this >> > time >> > it even doesn't let to boot into single-user with following panic message: >> > kernel trap 12 with interrupts disabled >> > panic: mutex sched lock recursed at ../../kern/kern_synch.c:872 >> >> E. That would be something that is leaking sched_lock. Hmm... > > I have another sched_lock-related problem which showed up over the > weekend. Starting StarOffice 5.2 invariably triggers the following > panic: > > root@aes /var/crash# gdb -k > sGNU gdb 4.18 > Copyright 1998 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-unknown-freebsd". > (kgdb) source ~des/kgdb > (kgdb) kernel 0 > IdlePTD 3526656 > initial pcb at 2cb980 > panicstr: from debugger > panic messages: > --- > panic: mutex sched lock not owned at ../../posix4/ksched.c:215 Easy enough. It seems I missed adding sched_lock around a need_resched(). I'll fix in a second.. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
On 22-Feb-01 Maxim Sobolev wrote: > John Baldwin wrote: > >> On 22-Feb-01 Maxim Sobolev wrote: >> > Dag-Erling Smorgrav wrote: >> > >> >> Maxim Sobolev <[EMAIL PROTECTED]> writes: >> >> > It's not an ata specific problem, but rather a problem of all ISA >> >> > devices (I have an ISA based ata controller). >> >> >> >> I don't think it has anything to do with ISA. I've had similar >> >> problems on a PCI-only system (actually, PCI+EISA motherboard with no >> >> EISA cards) with no ATA devices (disks, CD-ROM and streamer are all >> >> SCSI). >> >> >> >> Considering that backing out rev 1.14 of ithread.c eliminates the >> >> panics, and that that revision is supposed to enable interrupt thread >> >> preemption, and that the crashed kernels show signs of stack smashing, >> >> I'd say the cause is probably a bug in the preemption code. >> > >> > Update: the bug is still here, as of -current from 22 Feb. Hovewer, this >> > time >> > it even doesn't let to boot into single-user with following panic message: >> > kernel trap 12 with interrupts disabled >> > panic: mutex sched lock recursed at ../../kern/kern_synch.c:872 >> >> E. That would be something that is leaking sched_lock. Hmm... >> >> Got a backtrace? What is really annoying is that preemption has been in the >> kernel since Feb 1. I just accidentally turned it off in the ithread code >> reorganization and then turned it back on. It was off for a few hours after >> only being on for 2 weeks, and now everyone magically has problems. > > Here it is (from DDB): > panic(c027de93,c0297409,c027f878,368,80286) > _mtx_assert(c02ea000,9,c027f878,368,80286) > mi_switch(c32c5da0,3,c02cea44,c357be98) > ithread_schedule(c0747c00,1) > sched_ithd(e) > Xresume14() > --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- > trap(18, 10, 10,c01597b6,20) > calltrap() > --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- > sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) > ithread_loop(c0747c00,c357bfa8) > fork_exit(c0146cbc,c0747c00,c357bfa8) > fork_trampoline() *sigh* This is why enabling interrupts in trap() is such a bad idea. If we get a trap in the scheduler, then lots of bad crap starts to happen because we can get an interrupt while we are in a trap. :( Can you compile your kernel with INVARIANTS on though, as I think the kernel should've panic'd earlier if it is doing what I think it is doing. Also, if you are feeling industrious, edit sys/i386/i386/trap.c and comment out the enable_intr() call near the beginning of the trap() function right after the printf for 'kernel trap %d with interrupts disabled'. > -Maxim -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin wrote: > On 22-Feb-01 Maxim Sobolev wrote: > > Dag-Erling Smorgrav wrote: > > > >> Maxim Sobolev <[EMAIL PROTECTED]> writes: > >> > It's not an ata specific problem, but rather a problem of all ISA > >> > devices (I have an ISA based ata controller). > >> > >> I don't think it has anything to do with ISA. I've had similar > >> problems on a PCI-only system (actually, PCI+EISA motherboard with no > >> EISA cards) with no ATA devices (disks, CD-ROM and streamer are all > >> SCSI). > >> > >> Considering that backing out rev 1.14 of ithread.c eliminates the > >> panics, and that that revision is supposed to enable interrupt thread > >> preemption, and that the crashed kernels show signs of stack smashing, > >> I'd say the cause is probably a bug in the preemption code. > > > > Update: the bug is still here, as of -current from 22 Feb. Hovewer, this time > > it even doesn't let to boot into single-user with following panic message: > > kernel trap 12 with interrupts disabled > > panic: mutex sched lock recursed at ../../kern/kern_synch.c:872 > > E. That would be something that is leaking sched_lock. Hmm... > > Got a backtrace? What is really annoying is that preemption has been in the > kernel since Feb 1. I just accidentally turned it off in the ithread code > reorganization and then turned it back on. It was off for a few hours after > only being on for 2 weeks, and now everyone magically has problems. Here it is (from DDB): panic(c027de93,c0297409,c027f878,368,80286) _mtx_assert(c02ea000,9,c027f878,368,80286) mi_switch(c32c5da0,3,c02cea44,c357be98) ithread_schedule(c0747c00,1) sched_ithd(e) Xresume14() --- interrupt, eip = 0xc025b60f, esp = 0x80296, ebp = 0xc357bf08 --- trap(18, 10, 10,c01597b6,20) calltrap() --- trap 0x9, eip = 0xc025a5de, esp = 0xc357bf50, ebp = 0xc357bf64 --- sw1b(c0146cbc,c0146cbc,c32c5da0,c357bf94) ithread_loop(c0747c00,c357bfa8) fork_exit(c0146cbc,c0747c00,c357bfa8) fork_trampoline() -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
Re: A possible bug in the interrupt thread preemption code [Was:
John Baldwin <[EMAIL PROTECTED]> writes: > On 22-Feb-01 Maxim Sobolev wrote: > > Dag-Erling Smorgrav wrote: > > > >> Maxim Sobolev <[EMAIL PROTECTED]> writes: > >> > It's not an ata specific problem, but rather a problem of all ISA > >> > devices (I have an ISA based ata controller). > >> > >> I don't think it has anything to do with ISA. I've had similar > >> problems on a PCI-only system (actually, PCI+EISA motherboard with no > >> EISA cards) with no ATA devices (disks, CD-ROM and streamer are all > >> SCSI). > >> > >> Considering that backing out rev 1.14 of ithread.c eliminates the > >> panics, and that that revision is supposed to enable interrupt thread > >> preemption, and that the crashed kernels show signs of stack smashing, > >> I'd say the cause is probably a bug in the preemption code. > > > > Update: the bug is still here, as of -current from 22 Feb. Hovewer, this time > > it even doesn't let to boot into single-user with following panic message: > > kernel trap 12 with interrupts disabled > > panic: mutex sched lock recursed at ../../kern/kern_synch.c:872 > > E. That would be something that is leaking sched_lock. Hmm... I have another sched_lock-related problem which showed up over the weekend. Starting StarOffice 5.2 invariably triggers the following panic: root@aes /var/crash# gdb -k sGNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd". (kgdb) source ~des/kgdb (kgdb) kernel 0 IdlePTD 3526656 initial pcb at 2cb980 panicstr: from debugger panic messages: --- panic: mutex sched lock not owned at ../../posix4/ksched.c:215 panic: from debugger Uptime: 3m37s dumping to dev ad0b, offset 262528 dump ata0: resetting devices .. done 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 --- #0 dumpsys () at ../../kern/kern_shutdown.c:476 476 if (dumping++) { (kgdb) where #0 dumpsys () at ../../kern/kern_shutdown.c:476 #1 0xc0187a04 in boot (howto=260) at ../../kern/kern_shutdown.c:319 #2 0xc0187dd9 in panic (fmt=0xc02521b4 "from debugger") at ../../kern/kern_shutdown.c:569 #3 0xc011cdad in db_panic (addr=-1071459127, have_addr=0, count=-1, modif=0xc879cd9c "") at ../../ddb/db_command.c:433 #4 0xc011cd4b in db_command (last_cmdp=0xc0285420, cmd_table=0xc0285280, aux_cmd_tablep=0xc02b68bc) at ../../ddb/db_command.c:333 #5 0xc011ce12 in db_command_loop () at ../../ddb/db_command.c:455 #6 0xc011f07f in db_trap (type=3, code=0) at ../../ddb/db_trap.c:71 #7 0xc022d258 in kdb_trap (type=3, code=0, regs=0xc879ce9c) at ../../i386/i386/db_interface.c:164 #8 0xc023a098 in trap (frame={tf_fs = -1060962280, tf_es = -932118512, tf_ds = -1060962288, tf_edi = -1071197888, tf_esi = 256, tf_ebp = -931541272, tf_isp = -931541304, tf_ebx = 514, tf_edx = -1071149169, tf_ecx = -1070757120, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1071459127, tf_cs = 8, tf_eflags = 70, tf_esp = -1071149185, tf_ss = -1071240285}) at ../../i386/i386/trap.c:615 #9 0xc022d4c9 in Debugger (msg=0xc0262ba3 "panic") at machine/cpufunc.h:60 #10 0xc0187dd0 in panic (fmt=0xc0261a48 "mutex %s not owned at %s:%d") at ../../kern/kern_shutdown.c:567 #11 0xc0180c89 in _mtx_assert (m=0xc02e3e20, what=1, file=0xc026d140 "../../posix4/ksched.c", line=215) ---Type to continue, or q to quit--- at ../../kern/kern_mutex.c:611 #12 0xc01f0d51 in ksched_yield (ret=0xc8712f24, ksched=0xc0a97660) at ../../posix4/ksched.c:215 #13 0xc01f100b in sched_yield (p=0xc8712dc0, uap=0xc879cf80) at ../../posix4/p1003_1b.c:225 #14 0xc023b239 in syscall (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077939044, tf_esi = 706867048, tf_ebp = -1077939116, tf_isp = -931541036, tf_ebx = 714966384, tf_edx = 1, tf_ecx = 134979841, tf_eax = 158, tf_trapno = 22, tf_err = 2, tf_eip = 717073383, tf_cs = 31, tf_eflags = 514, tf_esp = -1077939144, tf_ss = 47}) at ../../i386/i386/trap.c:1191 #15 0xc022dbe3 in Xint0x80_syscall () #16 0x2a182a9e in ?? () #17 0x2a18a328 in ?? () #18 0x2a057f6b in ?? () #19 0x2a057eb5 in ?? () #20 0x28f5e2a9 in ?? () #21 0x28191db5 in ?? () #22 0x80513a3 in ?? () #23 0x28f55eab in ?? () #24 0x80512da in ?? () #25 0x2a059cf1 in ?? () #26 0x2a181e35 in ?? () ---Type to continue, or q to quit--- #27 0x2ab551eb in ?? () (kgdb) DES -- Dag-Erling Smor
RE: A possible bug in the interrupt thread preemption code [Was:
On 22-Feb-01 Maxim Sobolev wrote: > Dag-Erling Smorgrav wrote: > >> Maxim Sobolev <[EMAIL PROTECTED]> writes: >> > It's not an ata specific problem, but rather a problem of all ISA >> > devices (I have an ISA based ata controller). >> >> I don't think it has anything to do with ISA. I've had similar >> problems on a PCI-only system (actually, PCI+EISA motherboard with no >> EISA cards) with no ATA devices (disks, CD-ROM and streamer are all >> SCSI). >> >> Considering that backing out rev 1.14 of ithread.c eliminates the >> panics, and that that revision is supposed to enable interrupt thread >> preemption, and that the crashed kernels show signs of stack smashing, >> I'd say the cause is probably a bug in the preemption code. > > Update: the bug is still here, as of -current from 22 Feb. Hovewer, this time > it even doesn't let to boot into single-user with following panic message: > kernel trap 12 with interrupts disabled > panic: mutex sched lock recursed at ../../kern/kern_synch.c:872 E. That would be something that is leaking sched_lock. Hmm... Got a backtrace? What is really annoying is that preemption has been in the kernel since Feb 1. I just accidentally turned it off in the ithread code reorganization and then turned it back on. It was off for a few hours after only being on for 2 weeks, and now everyone magically has problems. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message
A possible bug in the interrupt thread preemption code [Was: Kernel panic in irq14: ata0]
Dag-Erling Smorgrav wrote: > Maxim Sobolev <[EMAIL PROTECTED]> writes: > > It's not an ata specific problem, but rather a problem of all ISA > > devices (I have an ISA based ata controller). > > I don't think it has anything to do with ISA. I've had similar > problems on a PCI-only system (actually, PCI+EISA motherboard with no > EISA cards) with no ATA devices (disks, CD-ROM and streamer are all > SCSI). > > Considering that backing out rev 1.14 of ithread.c eliminates the > panics, and that that revision is supposed to enable interrupt thread > preemption, and that the crashed kernels show signs of stack smashing, > I'd say the cause is probably a bug in the preemption code. Update: the bug is still here, as of -current from 22 Feb. Hovewer, this time it even doesn't let to boot into single-user with following panic message: kernel trap 12 with interrupts disabled panic: mutex sched lock recursed at ../../kern/kern_synch.c:872 syncing disks... -Maxim To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-current" in the body of the message