odd issues with DDB vs GDB

2010-09-15 Thread Patrick Mahan

All,

I am trying to debug a system hang occurring on my HP Proliant G6 running some 
of our
kernel software.  I am seeing that under certain test loads, the system will 
hang-up
complete, no keyboard, no console, etc.  I suspect it is some of the kernel 
code that
I have inherited that contains a lot of locking (lots of data structure, each 
having
their own mutex lock (sleepable)).

I rebuilt the kernel to include the following:

options KDB
options DDB
options GDB
options MUTEX_NOINLINE
options MUTEX_DEBUG
options WITNESS
options WITNESS_SKIPSPIN

options SW_WATCHDOG  # Enable to force us into the debugger on a hang

This places me in the kernel DDB debugger.  The backtrace show by DDB
makes a lot of sense, it is showing we are blocked in _mtx_lock_flags()+0x6f.

Great, so I go to enable GDB -

db> gdb
Step to enter the remote GDB backend.
db> s
$T0510:a6f86c80fff*";thread:186c0;#62
gdb kernel.debug
Current directory is 
~/devel/pm_bz5486/FBSD80REL/amd64/obj/usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/MPATH/
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

(gdb) target remote 10.10.29.111:7028
Remote debugging using 10.10.29.111:7028

0x806cf8a6 in kdb_init () at 
/usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/kern/subr_kdb.c:361
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
warning: shared library handler failed to enable breakpoint

gdb>

So right away I am somewhat suspicious as it is showing me a completely 
different entry
point.

DDB showed

Tracing pid 0 tid 100032 td 0xff0002668390
breakpoint() at breakpoint+0x5
kdb_enter() at kdb_enter+0x52
watchdog_fire() at watchdog_fire+0xda
hardclock() at hardclock+0x73
lapic_handle_timer() at lapic_handle_timer+0x120
Xtimerint() at Xtimerint+0x8c

But GDB is showing the above.

A backtrace (bt) in GDB does not show the same stack signature.

I have attached the complete log for those who are interested.  Is there a 
reason for the wide
difference between DDB and GDB?  Am I invoking gdb incorrectly?

Thanks for the education, as always!

Patrick
Debugging a system hang.  Enabled watchdog(4) built kernel with KDB, DDB and
GDB.  I am trying to debug this via remote GDB but what DDB shows for a stack
trace and what GDB shows are two seperate animals.

External serial port setup with the following in /boot/loader.conf

console="comconsole vidconsole"
comconsole_speed=9600
hint.uart.0.flags="0x90"

Serial is accessed via a cyclades ACS console server.  'telnet 10.10.29.111 
70XX' where XX is the physical port number.

System comes up fine, testing is initiated, eventually the system hangs and
the watchdog fires dropping us into DDB -

DDB output

db> trace
Tracing pid 0 tid 100032 td 0xff0002668390
breakpoint() at breakpoint+0x5
kdb_enter() at kdb_enter+0x52
watchdog_fire() at watchdog_fire+0xda
hardclock() at hardclock+0x73
lapic_handle_timer() at lapic_handle_timer+0x120
Xtimerint() at Xtimerint+0x8c
--- interrupt, rip = 0x80688532, rsp = 0xff800011e460, rbp = 
0xff800011e4c0 ---
_mtx_lock_sleep() at _mtx_lock_sleep+0x92
_mtx_lock_flags() at _mtx_lock_flags+0x6f
VCDgetWithIIFremote() at VCDgetWithIIFremote+0x3f
ProcessDataPkt() at ProcessDataPkt+0x3dc
ip_input() at ip_input+0xa24
netisr_dispatch_src() at netisr_dispatch_src+0xe3
netisr_dispatch() at netisr_dispatch+0x20
gif_input() at gif_input+0x324
in_gif_input() at in_gif_input+0x28f
encap4_input() at encap4_input+0x1b8
ip_input() at ip_input+0xd1a
netisr_dispatch_src() at netisr_dispatch_src+0xe3
netisr_dispatch() at netisr_dispatch+0x20
ether_demux() at ether_demux+0x1f3
ether_input() at ether_input+0x4ab
em_rxeof() at em_rxeof+0x410
em_handle_que() at em_handle_que+0x6f
taskqueue_run() at taskqueue_run+0xbb
taskqueue_thread_loop() at taskqueue_thread_loop+0x33
fork_exit() at fork_exit+0xba
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff800011ed30, rbp = 0 ---

db>gdb
Step to enter the remote GDB backend.
db>s
^]
telnet> quit
#
# Enter the debugger via remote gdb
#
gdb kernel.debug
Current directory is 
~/devel/pm_bz5486/FBSD80REL/amd64/obj/usr/home/pmahan/devel/pm_bz5486/FBSD80REL/src/sys/MPATH/
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"

Re: odd issues with DDB vs GDB

2010-09-16 Thread John Baldwin
On Wednesday, September 15, 2010 8:01:19 pm Patrick Mahan wrote:
> All,
> 
> I am trying to debug a system hang occurring on my HP Proliant G6 running 
> some of our
> kernel software.  I am seeing that under certain test loads, the system will 
> hang-up
> complete, no keyboard, no console, etc.  I suspect it is some of the kernel 
> code that
> I have inherited that contains a lot of locking (lots of data structure, each 
> having
> their own mutex lock (sleepable)).

You need to use 'kgdb' rather than 'gdb' on kernel.debug.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: odd issues with DDB vs GDB

2010-09-16 Thread Patrick Mahan



John Baldwin wrote:

On Wednesday, September 15, 2010 8:01:19 pm Patrick Mahan wrote:

All,

I am trying to debug a system hang occurring on my HP Proliant G6 running some 
of our
kernel software.  I am seeing that under certain test loads, the system will 
hang-up
complete, no keyboard, no console, etc.  I suspect it is some of the kernel 
code that
I have inherited that contains a lot of locking (lots of data structure, each 
having
their own mutex lock (sleepable)).


You need to use 'kgdb' rather than 'gdb' on kernel.debug.



Doh! *-(

I'm so used to gdb even though I use kgdb for looking at crash dumps.

Thanks,

Patrick
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"