Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-14 Thread Steven Hartland


- Original Message - 
From: "Attilio Rao" 

Anyway, we really would need much more information in order to take a
proactive action.



Would it be possible to access to one of the panic'ing machine? Is it
always the same panic which is happening or it is variadic (like: once
page fault, once fatal double fault, once fatal trap, etc.).


They are always double fault, 99% of the time with no additional info
we've seen 1 mention of java on one of the machines but the vmcore
didn't seem to mention anything to do with that after dump.

My colleague informs me when he did the upgrade to add in schedule
stop patch, pretty much every machine paniced when shutting the
java servers down, which is essentially a jail stop.

I've also had two panics when rebooting my test machine to change
kernel settings, although this could be a side effect of the scheduler
patch?

This single test machine is now running with the following none standard
settings:-
options INVARIANTS
options INVARIANT_SUPPORT
options DDB
options KSTACK_PAGES=12

I've got several vmcores from a number or different machines but none
seem to be any use, as they don't seem to list any thread that caused
the panic i.e. no mention of dump, or fault.

Is there something else in particular I should be looking for?

Circumstantial evidence seems to indicate uptime may to be a factor,
machines under 2 days seem much less likely to panic.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-14 Thread Steven Hartland
- Original Message - 
From: "Andriy Gapon" 


Maybe test it on couple of machines first just in case I overlooked something
essential, although I have a report from another use that the patch didn't break
anything for him (it was tested for an unrelated issue).


We've got this running on a ~40 machines and just had the first panic
since the update. Unfortunately it doesn't seem to have changed anything :(

We have 352 thread entries starting with:-
#0  sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0, flags=Variable 
"flags" is not available.
23 with:-
cpustop_handler () at atomic.h:285
and 16 with:-
#0  fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:562

The main message being:-
panic: double fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>Aug 14 15:13:33 amsbld15 syslogd: exiting on signal 15

Fatal double fault
rip = 0x8053b691
rsp = 0xff8d8f356fb0
rbp = 0xff8d8f357210
cpuid = 2; apic id = 02
panic: double fault
cpuid = 2
KDB: stack backtrace:
#0 0x803bb75e at kdb_backtrace+0x5e
#1 0x8038956e at panic+0x2ae
#2 0x805802b6 at dblfault_handler+0x96
#3 0x8056900d at Xdblfault+0xad
stack: 0xff8d8f357000, 4
rsp = 0xff89ae10
Uptime: 2d21h6m18s
Physical memory: 49132 MB
Dumping 17080 MB: 17065...
Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/linprocfs.ko...Reading symbols from 
/boot/kernel/linprocfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/linprocfs.ko
Reading symbols from /boot/kernel/nullfs.ko...Reading symbols from 
/boot/kernel/nullfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/nullfs.ko
#0  sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0, flags=Variable 
"flags" is not available.)
   at /usr/src/sys/kern/sched_ule.c:1858
1858cpuid = PCPU_GET(cpuid);
(kgdb) #0  sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0, flags=Variable 
"flags" is not available.)
   at /usr/src/sys/kern/sched_ule.c:1858
#1  0x80391a99 in mi_switch (flags=260, newtd=0x0)
   at /usr/src/sys/kern/kern_synch.c:451
#2  0x803c5112 in sleepq_timedwait (wchan=0x8083e080, pri=68)
   at /usr/src/sys/kern/subr_sleepqueue.c:644
#3  0x80391efb in _sleep (ident=0x8083e080, lock=0x0,
   priority=Variable "priority" is not available.) at 
/usr/src/sys/kern/kern_synch.c:230
#4  0x8053ebc9 in scheduler (dummy=Variable "dummy" is not available.)
   at /usr/src/sys/vm/vm_glue.c:807
#5  0x80341767 in mi_startup () at /usr/src/sys/kern/init_main.c:254
#6  0x8016efdc in btext () at /usr/src/sys/amd64/amd64/locore.S:81
#7  0x80863dc8 in sleepq_chains ()
#8  0x80848ae0 in cpu_top ()
#9  0x in ?? ()
#10 0x8083e4e0 in proc0 ()
#11 0x80bb3b90 in ?? ()
#12 0x80bb3b38 in ?? ()
#13 0xff0012d838c0 in ?? ()
#14 0x803aeb19 in sched_switch (td=0x0, newtd=0x0, flags=Variable 
"flags" is not available.)
   at /usr/src/sys/kern/sched_ule.c:1852
Previous frame inner to this frame (corrupt stack?)

There are some indications that stopping jails could be the
cause of the panics so on one test box I've added in invariants
to see if we get anything shows up from that.

   Regards
   Steve



This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"