Re: VOP_GETATTR panic on Alpha
On 17-Jul-2002 Bruce Evans wrote: On Tue, 16 Jul 2002, John Baldwin wrote: On 17-Jul-2002 Bruce Evans wrote: mtx_lock_spin(sched_lock); if (cold || panicstr) { /* * After a panic, or during autoconfiguration, * just give interrupts a chance, then just return; This is the rotted comment. No chance is given here. Well, when you unlock sched_lock you give ithreads a chance to run. (This is only true in a fully preemptive kernel though.) It now only releases the lock that it aquired. splx(safepri) gave a nesting-violating unlocking corresponding to releasing the caller(s) locks. However, it is probably a bug to call msleep() with sched_lock held, so releasing sched_lock would release it completely but not give interrupts any better chance than they had to begin with. Well, the only other locks the caller can hold is the mutex passed in and we also release that as well, which could free up ithreads blocked on that lock. Bruce -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
Dag-Erling Smorgrav writes: Andrew Gallatin [EMAIL PROTECTED] writes: Welcome to hell. Thanks, it sure looks cozy in here :) If you clear panicstr, you have a chance of getting a dump. How do I do that? Just clear panicstr (w panicstr 0) when you drop into the debugger on a panic. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
Andrew Gallatin [EMAIL PROTECTED] writes: Just clear panicstr (w panicstr 0) when you drop into the debugger on a panic. No luck. However, I added an ASSERT_VOP_LOCKED() to vn_statfile(), and confirmed that vn_lock() fails to lock the vnode. Unfortunately, without a dump it's hard to tell exactly what kind of vnode we're dealing with. I thought the problem might be an incorrect vop_lock implementation in devfs (the only synthetic filesystem mounted on the box), but devfs uses std_{islocked,lock,unlock}, and visual inspection didn't uncover any obvious flaws in any of those. Any other suggestions before I just turn off DEBUG_VFS_LOCKS? DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
Hi! On Tue, Jul 16, 2002 at 02:45:11PM +0200, Dag-Erling Smorgrav wrote: The following panic is 100% reproducable - it happens whenever I boot a recent kernel on Alpha, just before init(8) starts getty(8) on the console: sorry, kernel from today's sources at 17:38 works just fine. Yet another question about kernel core dumps: what should I do to get one?-) Why panic from debugger on i386 gives core dump and reboots the system and panic from debugger on Alpha does not? Andrew. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
On 16 Jul, Dag-Erling Smorgrav wrote: Andrew Gallatin [EMAIL PROTECTED] writes: Just clear panicstr (w panicstr 0) when you drop into the debugger on a panic. No luck. However, I added an ASSERT_VOP_LOCKED() to vn_statfile(), and confirmed that vn_lock() fails to lock the vnode. Unfortunately, without a dump it's hard to tell exactly what kind of vnode we're dealing with. I thought the problem might be an incorrect vop_lock implementation in devfs (the only synthetic filesystem mounted on the box), but devfs uses std_{islocked,lock,unlock}, and visual inspection didn't uncover any obvious flaws in any of those. Any other suggestions before I just turn off DEBUG_VFS_LOCKS? Turn it off. I ran into similar problems about a week ago and the author told me that this option was broken because only part of the stuff in his private source tree was committed. I haven't heard anything since then to make me think that this option has been fixed. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
Andrew Kolchoogin [EMAIL PROTECTED] writes: sorry, kernel from today's sources at 17:38 works just fine. Try with DEBUG_VFS_LOCKS. DES -- Dag-Erling Smorgrav - [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
Andrew Kolchoogin writes: Hi! On Tue, Jul 16, 2002 at 02:45:11PM +0200, Dag-Erling Smorgrav wrote: The following panic is 100% reproducable - it happens whenever I boot a recent kernel on Alpha, just before init(8) starts getty(8) on the console: sorry, kernel from today's sources at 17:38 works just fine. Yet another question about kernel core dumps: what should I do to get one?-) Why panic from debugger on i386 gives core dump and reboots the system and panic from debugger on Alpha does not? Because, as BDE says, that crashdumps work at all is mosty accidental. On alpha, a random kernel thread is waking up, and is unable to go back to sleep because of the panicstr hack msleep: mtx_lock_spin(sched_lock); if (cold || panicstr) { /* * After a panic, or during autoconfiguration, * just give interrupts a chance, then just return; * don't run any other procs or panic below, * in case this is the idle process and already asleep. */ if (mtx != NULL priority PDROP) mtx_unlock(mtx); mtx_unlock_spin(sched_lock); return (0); } We need to somehow let only interrupt threads and the panic'ed process run after a panic. I have no idea how to do this in a clean, low-impact way. Drew PS: I was trying to make crashdumps fail on x86 by increasing HZ. But I cannot. I have no idea why this only happens on alpha. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
Hi! On Tue, Jul 16, 2002 at 05:59:16PM +0200, Dag-Erling Smorgrav wrote: sorry, kernel from today's sources at 17:38 works just fine. Try with DEBUG_VFS_LOCKS. Well. Say that me is the lamest programmer at the world. :) My Alpha DOESN'T go to debugger. Instead it hungs in the internals of the kernel. Break on serial console brings me to debugger. Instead of that it doesn't respond neither to the network nor the serial console. Andrew. P.S. Well, I'll try to get kernel core dump and analyse it using debugger. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
Andrew, On Tue, Jul 16, 2002 at 01:46:16PM -0400, Andrew Gallatin wrote: PS: I was trying to make crashdumps fail on x86 by increasing HZ. But I cannot. I have no idea why this only happens on alpha. have you any ideas what we should to test?-) Andrew. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
On 16-Jul-2002 Andrew Gallatin wrote: Andrew Kolchoogin writes: Hi! On Tue, Jul 16, 2002 at 02:45:11PM +0200, Dag-Erling Smorgrav wrote: The following panic is 100% reproducable - it happens whenever I boot a recent kernel on Alpha, just before init(8) starts getty(8) on the console: sorry, kernel from today's sources at 17:38 works just fine. Yet another question about kernel core dumps: what should I do to get one?-) Why panic from debugger on i386 gives core dump and reboots the system and panic from debugger on Alpha does not? Because, as BDE says, that crashdumps work at all is mosty accidental. On alpha, a random kernel thread is waking up, and is unable to go back to sleep because of the panicstr hack msleep: mtx_lock_spin(sched_lock); if (cold || panicstr) { /* * After a panic, or during autoconfiguration, * just give interrupts a chance, then just return; * don't run any other procs or panic below, * in case this is the idle process and already asleep. */ if (mtx != NULL priority PDROP) mtx_unlock(mtx); mtx_unlock_spin(sched_lock); return (0); } We need to somehow let only interrupt threads and the panic'ed process run after a panic. I have no idea how to do this in a clean, low-impact way. It's probably preemption. However, the problem may be that you can't switch to the ithread if you just turn preemption on for panics because the ithread may already be on the run queue, thus why your earlier patch may not have worked. Try to see if it works ok if you turn on preemption fully by making the second arg to ithread_schedule() be !cold. FWIW, I only have problems with preemption on alphas if I use SMP. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
On 16-Jul-2002 Andrew Gallatin wrote: Alfred Perlstein writes: We need to somehow let only interrupt threads and the panic'ed process run after a panic. I have no idea how to do this in a clean, low-impact way. Drew PS: I was trying to make crashdumps fail on x86 by increasing HZ. But I cannot. I have no idea why this only happens on alpha. um, psuedocode... for ithreads, td-td_flags |= TD_ITHREAD for panicing thread, td-td_flags |= TD_INPANIC if ((cold || panicstr) (td-td_flags (TD_ITHREAD|TD_INPANIC)) != 0) { I have no idea what's planned for td_flags. Is stealing 2 values for this use acceptable? I didn't consider touching the flags to be lightweight.. If so, I was thinking more like #define TDF_PANICSCHED 0x02 /* may be scheduled during/after a panic */ You can already do if (td-td_ithd != NULL) to do the TD_ITHREAD test. The problem is that this won't work if there is a process on the run queue with a higher priority than the currently running process. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
John Baldwin writes: We need to somehow let only interrupt threads and the panic'ed process run after a panic. I have no idea how to do this in a clean, low-impact way. It's probably preemption. However, the problem may be that you can't switch to the ithread if you just turn preemption on for panics because the ithread may already be on the run queue, thus why your earlier patch may not have worked. Try to see if it works ok if you turn on preemption fully by making the second arg to ithread_schedule() be !cold. FWIW, I only have problems with preemption on alphas if I use SMP. I think its more than this. I tried unconditionally enabling premption, and I see no improvement. After a panic, it wedges, and I see this : db c syncing disks... done Uptime: 4m26s [halt sent] Stopped at siointr1+0x198: br zero,siointr1+0x330 zero=0x0 db tr siointr1() at siointr1+0x198 siointr() at siointr+0x40 isa_handle_fast_intr() at isa_handle_fast_intr+0x24 alpha_dispatch_intr() at alpha_dispatch_intr+0xd0 interrupt() at interrupt+0x110 XentInt() at XentInt+0x28 --- interrupt (from ipl 0) --- critical_exit() at critical_exit+0x20 _mtx_unlock_spin_flags() at _mtx_unlock_spin_flags+0xc4 msleep() at msleep+0x2b0 buf_daemon() at buf_daemon+0x1f4 fork_exit() at fork_exit+0xe0 exception_return() at exception_return --- root of call graph --- So its still stuck in msleep. How is it supposed to get back to the panic'ed thread if a system thread wakes up and is not allowed to go back to sleep??? Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
On 16-Jul-2002 Andrew Gallatin wrote: John Baldwin writes: We need to somehow let only interrupt threads and the panic'ed process run after a panic. I have no idea how to do this in a clean, low-impact way. It's probably preemption. However, the problem may be that you can't switch to the ithread if you just turn preemption on for panics because the ithread may already be on the run queue, thus why your earlier patch may not have worked. Try to see if it works ok if you turn on preemption fully by making the second arg to ithread_schedule() be !cold. FWIW, I only have problems with preemption on alphas if I use SMP. I think its more than this. I tried unconditionally enabling premption, and I see no improvement. After a panic, it wedges, and I see this : db c syncing disks... done Uptime: 4m26s [halt sent] Stopped at siointr1+0x198: br zero,siointr1+0x330 zero=0x0 db tr siointr1() at siointr1+0x198 siointr() at siointr+0x40 isa_handle_fast_intr() at isa_handle_fast_intr+0x24 alpha_dispatch_intr() at alpha_dispatch_intr+0xd0 interrupt() at interrupt+0x110 XentInt() at XentInt+0x28 --- interrupt (from ipl 0) --- critical_exit() at critical_exit+0x20 _mtx_unlock_spin_flags() at _mtx_unlock_spin_flags+0xc4 msleep() at msleep+0x2b0 buf_daemon() at buf_daemon+0x1f4 fork_exit() at fork_exit+0xe0 exception_return() at exception_return --- root of call graph --- So its still stuck in msleep. How is it supposed to get back to the panic'ed thread if a system thread wakes up and is not allowed to go back to sleep??? Hm. Surprised we don't see this on other archs then (or maybe we do...). Probably when we have panic'd (and after we leave the debugger and go into boot() or some such) we should take any non-P_SYSTEM processes off the run queues and then remove the panicstr checks from msleep() and the condition variable wait functions. Perhaps better is to dink around in choosethread() so that if panicstr is set, we throw away any threads get that aren't P_SYSTEM or have the TDF_INPANIC flag set. By throw away, I mean that we just ignore any such threads and loop if we get one we want to throw away. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
John Baldwin writes: So its still stuck in msleep. How is it supposed to get back to the panic'ed thread if a system thread wakes up and is not allowed to go back to sleep??? Hm. Surprised we don't see this on other archs then (or maybe we do...). Probably when we have panic'd (and after we leave the debugger and go into boot() or some such) we should take any non-P_SYSTEM processes off the run queues and then remove the panicstr checks from msleep() and the condition variable wait functions. Do you have something like the following psuedo code in mind? Perhaps placed just prior to the call to boot() in panic()? foreach p in (all procs in system) { if (p == curproc) continue if (p-p_flag P_SYSTEM) continue; foreach td in (all threads in p) if (td-td_state == TDS_RUNQ) remrunqueue(td); } I assume a panic will IPI other processors and halt them in their tracks so we don't need to worry too much about locking? Perhaps better is to dink around in choosethread() so that if panicstr is set, we throw away any threads get that aren't P_SYSTEM or have the TDF_INPANIC flag set. By throw away, I mean that we just ignore any such threads and loop if we get one we want to throw away. I think that I like your first idea better.. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
On 16-Jul-2002 Andrew Gallatin wrote: John Baldwin writes: So its still stuck in msleep. How is it supposed to get back to the panic'ed thread if a system thread wakes up and is not allowed to go back to sleep??? Hm. Surprised we don't see this on other archs then (or maybe we do...). Probably when we have panic'd (and after we leave the debugger and go into boot() or some such) we should take any non-P_SYSTEM processes off the run queues and then remove the panicstr checks from msleep() and the condition variable wait functions. Do you have something like the following psuedo code in mind? Perhaps placed just prior to the call to boot() in panic()? foreach p in (all procs in system) { if (p == curproc) continue if (p-p_flag P_SYSTEM) continue; foreach td in (all threads in p) if (td-td_state == TDS_RUNQ) remrunqueue(td); } I assume a panic will IPI other processors and halt them in their tracks so we don't need to worry too much about locking? Hmm, it doesn't atm. Debugger() does, but panic doesn't. Perhaps better is to dink around in choosethread() so that if panicstr is set, we throw away any threads get that aren't P_SYSTEM or have the TDF_INPANIC flag set. By throw away, I mean that we just ignore any such threads and loop if we get one we want to throw away. I think that I like your first idea better.. I like my second, it is easier, just add this to choosethread: if (panicstr ((td-td_proc-p_flag P_SYSTEM) == 0 (td-td_flags TDF_INPANIC) == 0)) goto top; (Do this just before the td_state change and return()). You of couse need to set TDF_INPANIC when setting panicstr. This might mean that we break the restartable panics case. In that case you would just change this so that runq_choose() actually does the work to ignore threads on the run queue instead which eliminates the loop and ugly goto and preserves the runqueue state in the core dump. Drew -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
John Baldwin writes: I like my second, it is easier, just add this to choosethread: Don't all these compares in the critical path add up? if (panicstr ((td-td_proc-p_flag P_SYSTEM) == 0 (td-td_flags TDF_INPANIC) == 0)) goto top; (Do this just before the td_state change and return()). You of couse need to set TDF_INPANIC when setting panicstr. This might mean that we break the restartable panics case. In that case you would just change this so that runq_choose() actually does the work to ignore threads on the run queue instead which eliminates the loop and ugly goto and preserves the runqueue state in the core dump. I assume I also need to remove the panicstr check in at least msleep. Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
On Tue, 16 Jul 2002, Andrew Gallatin wrote: Andrew Kolchoogin writes: Why panic from debugger on i386 gives core dump and reboots the system and panic from debugger on Alpha does not? Because, as BDE says, that crashdumps work at all is mosty accidental. Er, I meant that working of syncs in panic() is mostly accidental. Panic dumps should not be affected, since they should involve little more than the driver's dump routine which should not depend on interrupts or context switching working. Dump routines must use polling only, and run with some sort of lock to prevent context switching. splhigh() is used in RELENG_4. sched_lock should probably be used in -current, but there seems to be only a (null) splhigh(). This could also be just a driver problem. I know the old wddump routine worked right but am not sure about any of the current ones. Maybe dumps are broken on the alpha only due to driver problems. Note that the splhigh() didn't actually lock out interrupts in RELENG_4 for drivers broken enough to call tsleep(). The [un]safepri hack in tsleep() may permit broken dump routines that call tsleep() to work. This hack has been lost in -current except for rotted comments which still say that it is done. On alpha, a random kernel thread is waking up, and is unable to go back to sleep because of the panicstr hack msleep: mtx_lock_spin(sched_lock); if (cold || panicstr) { /* * After a panic, or during autoconfiguration, * just give interrupts a chance, then just return; This is the rotted comment. No chance is given here. * don't run any other procs or panic below, * in case this is the idle process and already asleep. Looks like more bitrot. We've learned that the idle process can't call here. */ if (mtx != NULL priority PDROP) mtx_unlock(mtx); mtx_unlock_spin(sched_lock); The safepri hack (splx(safepri); splx(origpri);) was here instead of these mtx operations. return (0); } We need to somehow let only interrupt threads and the panic'ed process run after a panic. I have no idea how to do this in a clean, low-impact way. I don't want to do this since I think there is no clean way to do it. But crash dumps must work without using interrupt threads, etc. I think the right way to do the sync is to always do a crash dump and have fsck_*fs recover buffers from it rather than let the panicing kernel possibly create further damage. But changing fsck_*fs to do this would be a lot of work. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: VOP_GETATTR panic on Alpha
On 17-Jul-2002 Bruce Evans wrote: This could also be just a driver problem. I know the old wddump routine worked right but am not sure about any of the current ones. Maybe dumps are broken on the alpha only due to driver problems. Note that the splhigh() didn't actually lock out interrupts in RELENG_4 for drivers broken enough to call tsleep(). The [un]safepri hack in tsleep() may permit broken dump routines that call tsleep() to work. This hack has been lost in -current except for rotted comments which still say that it is done. Agreed, if drivers depend on interrupts to work for dumps that is a Bug (tm). On alpha, a random kernel thread is waking up, and is unable to go back to sleep because of the panicstr hack msleep: mtx_lock_spin(sched_lock); if (cold || panicstr) { /* * After a panic, or during autoconfiguration, * just give interrupts a chance, then just return; This is the rotted comment. No chance is given here. Well, when you unlock sched_lock you give ithreads a chance to run. (This is only true in a fully preemptive kernel though.) * don't run any other procs or panic below, * in case this is the idle process and already asleep. Looks like more bitrot. We've learned that the idle process can't call here. Yes. */ if (mtx != NULL priority PDROP) mtx_unlock(mtx); mtx_unlock_spin(sched_lock); The safepri hack (splx(safepri); splx(origpri);) was here instead of these mtx operations. Probably to truly emulate this we should always release the 'mtx' mutex and then reacquire it if PDROP isn't specified. return (0); } We need to somehow let only interrupt threads and the panic'ed process run after a panic. I have no idea how to do this in a clean, low-impact way. I don't want to do this since I think there is no clean way to do it. But crash dumps must work without using interrupt threads, etc. I think the right way to do the sync is to always do a crash dump and have fsck_*fs recover buffers from it rather than let the panicing kernel possibly create further damage. But changing fsck_*fs to do this would be a lot of work. I agree that this would be the best solution for the long term if we can have it. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message