On Friday 14 May 2010 7:59:40 am Terry Kennedy wrote:
> > > The crash was a "page fault while in kernel mode" with the current process
> > > being the interrupt service routine for the bce0 GigE. Things progressed
> > > reasonably until partway through the dump, when the system locked up with 
> > > a
> > > "Sleeping thread (tid 100028, pid 12) owns a non-sleepable lock". That's 
> > > the
> > > same PID as reported in the main crash.
> >
> > Hmm.  You could try changing the code to not do a nested panic in that
> > case.  You would update subr_turnstile.c to just return if panicstr is
> > not NULL rather than calling panic.  However, there is still a good
> > chance you will end up deadlocking in that case.  I have another patch I
> > can send you next week that prevents blocking on mutexes duing a panic
> > which may also help.
> 
>   Ok, I'll be glad to try that.

--- //depot/vendor/freebsd/src/sys/kern/kern_mutex.c    2010/01/23 15:55:14
+++ //depot/projects/smpng/sys/kern/kern_mutex.c        2010/03/10 22:33:24
@@ -348,6 +348,15 @@
                return;
        }
 
+       /*
+        * If we have already panic'd and this is the thread that called
+        * panic(), then don't block on any mutexes but silently succeed.
+        * Otherwise, the kernel will deadlock since the scheduler isn't
+        * going to run the thread that holds the lock we need.
+        */
+       if (panicstr != NULL && curthread->td_flags & TDF_INPANIC)
+               return;
+
        lock_profile_obtain_lock_failed(&m->lock_object,
                    &contested, &waittime);
        if (LOCK_LOG_TEST(&m->lock_object, opts))
@@ -664,6 +673,15 @@
        }
 
        /*
+        * If we failed to unlock this lock and we are a thread that has
+        * called panic(), it may be due to the bypass in _mtx_lock_sleep()
+        * above.  In that case, just return and leave the lock alone to
+        * avoid changing the state.
+        */
+       if (panicstr != NULL && curthread->td_flags & TDF_INPANIC)
+               return;
+
+       /*
         * We have to lock the chain before the turnstile so this turnstile
         * can be removed from the hash list if it is empty.
         */

> > > 3) Is there any way to rig the system to obtain more info if this happens
> > > again? Right now I'm using an embedded remote console server, but I could
> > > switch the system to a serial port if enabling the kernel debugger might 
> > > help.
> > > But I think that the sleeping thread bit would happen even at the debugger
> > > prompt, wouldn't it?
> >
> > Include DDB and enable the 'trace_on_panic' sysctl knob perhaps.
> 
>   Hmmm. Do you think it will get very far before the sleeping thread business
> locks it up?

It should be able to print the backtrace when it panics at least.

-- 
John Baldwin
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to