Re: panic in deadlkres() on r267110

2014-06-06 Thread Sean Bruno
On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
> Two machines in the cluster panic last night with the same backtrace.
> It is unclear yet exactly what was happening on the systems, but both
> are port building machines using ports-mgmt/tinderbox.
> 
> Any ideas or information on how to further debug this would be
> appreciated.
> 
These machines were happily running r266621 previously to this update
yesterday.  So, that gives us a bisection point.

sean


> Script started on Fri Jun  6 14:01:53 2014
> command: /bin/sh
> # uname -a
> FreeBSD redbuild04.nyi.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 
> r267110: Thu Jun  5 15:57:43 UTC 2014 
> sbr...@redbuild04.nyi.freebsd.org:/usr/obj/usr/src/sys/REDBUILD  amd64
> # kgdb ./kernel.debug /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> panic: deadlkres: possible deadlock detected on allproc_lock
> 
> cpuid = 17
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1838702a20
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1838702ad0
> panic() at panic+0x155/frame 0xfe1838702b50
> deadlkres() at deadlkres+0x42a/frame 0xfe1838702bb0
> fork_exit() at fork_exit+0x9a/frame 0xfe1838702bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe1838702bf0
> --- trap 0, rip = 0, rsp = 0xfe1838702cb0, rbp = 0 ---
> KDB: enter: panic
> 
> Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/zfs.ko.symbols
> Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> Loaded symbols for /boot/kernel/opensolaris.ko.symbols
> Reading symbols from /boot/kernel/ums.ko.symbols...done.
> Loaded symbols for /boot/kernel/ums.ko.symbols
> Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/linprocfs.ko.symbols
> Reading symbols from /boot/kernel/linux.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux.ko.symbols
> #0  doadump (textdump=-946873840) at pcpu.h:219
> 219 __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) bt
> #0  doadump (textdump=-946873840) at pcpu.h:219
> #1  0x8034e865 in db_fncall (dummy1=, 
> dummy2=, dummy3=, 
> dummy4=) at /usr/src/sys/ddb/db_command.c:578
> #2  0x8034e54d in db_command (cmd_table=0x0)
> at /usr/src/sys/ddb/db_command.c:449
> #3  0x8034e2c4 in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:502
> #4  0x80350d20 in db_trap (type=, code=0)
> at /usr/src/sys/ddb/db_main.c:231
> #5  0x809a9bd9 in kdb_trap (type=3, code=0, tf=)
> at /usr/src/sys/kern/subr_kdb.c:656
> #6  0x80dc00e3 in trap (frame=0xfe1838702a00)
> at /usr/src/sys/amd64/amd64/trap.c:551
> #7  0x80da29c2 in calltrap ()
> at /usr/src/sys/amd64/amd64/exception.S:231
> #8  0x809a933e in kdb_enter (why=0x81039a72 "panic", 
> msg=) at cpufunc.h:63
> #9  0x8096a8b5 in panic (fmt=)
> at /usr/src/sys/kern/kern_shutdown.c:749
> #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
> #11 0x8093170a in fork_exit (callout=0x8090cd40 , 
> arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977
> ---Type  to continue, or q  to quit---
> #12 0x80da2efe in fork_trampoline ()
> at /usr/src/sys/amd64/amd64/exception.S:605
> #13 0x in ?? ()
> Current language:  auto; currently minimal
> (kgdb) fr 10
> #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
> 203 panic("%s: possible deadlock detected on 
> allproc_lock\n",
> (kgdb) l
> 198  * priority inversion problem leading to starvation.
> 199  * If the lock can't be held after 100 tries, panic.
> 200  */
> 201 if (!sx_try_slock(&allproc_lock)) {
> 202 if (tryl > 100)
> 203 panic("%s: possible deadlock detected on 
> allproc_lock\n",
> 204 __func__);
> 205 tryl++;
> 206 pause("allproc", sleepfreq * hz);
> 207 continue;
> (kgdb) up
> #11 0x8093170a in fork_exit (callout=0x8090cd40 , 
> arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977
> 977 callout(arg, frame);
> (kgdb) l
> 972  * cpu_set_fork_handler intercepts this function call to
> 973  * have this call a non-ret

Re: panic in deadlkres() on r267110

2014-06-06 Thread Ryan Stone
On Fri, Jun 6, 2014 at 10:12 AM, Glen Barber  wrote:
> Two machines in the cluster panic last night with the same backtrace.
> It is unclear yet exactly what was happening on the systems, but both
> are port building machines using ports-mgmt/tinderbox.
>
> Any ideas or information on how to further debug this would be
> appreciated.

The first thing that I would do is try to figure out who holds the
allproc_lock.  If the lock is held exclusively then
the pointer to the thread struct will be in allproc_lock.sx_lock.

If the lock is held shared then it gets trickier.  I'm not sure if
there is a better possible way than doing a thread apply all bt in
kgdb and finding which stack(s) could hold allproc_lock.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic in deadlkres() on r267110

2014-06-06 Thread Glen Barber
On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote:
> On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
> > Two machines in the cluster panic last night with the same backtrace.
> > It is unclear yet exactly what was happening on the systems, but both
> > are port building machines using ports-mgmt/tinderbox.
> > 
> > Any ideas or information on how to further debug this would be
> > appreciated.
> > 
> These machines were happily running r266621 previously to this update
> yesterday.  So, that gives us a bisection point.
> 

Some more debug information.  Thank you to Attilio for information on
what data to get.

Script started on Fri Jun  6 15:00:53 2014
command: /bin/sh
# kgdb ./kernel.debug /var/crash/vmcore.0
[...]
#0  doadump (textdump=-946873840) at pcpu.h:219
219 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) p/x allproc_lock.sx_lock
$1 = 0xf813ae7f4924
Current language:  auto; currently minimal
(kgdb) p ((struct thread *)0xf813ae7f4924)
$2 = (struct thread *) 0xf813ae7f4924
(kgdb) p $2->td_tid
$3 = 0
(kgdb) ^D
# ^D
Script done on Fri Jun  6 15:03:19 2014

Glen



pgpAoA_3Wzmb1.pgp
Description: PGP signature


Re: panic in deadlkres() on r267110

2014-06-06 Thread Attilio Rao
On Fri, Jun 6, 2014 at 5:06 PM, Glen Barber  wrote:
> On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote:
>> On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
>> > Two machines in the cluster panic last night with the same backtrace.
>> > It is unclear yet exactly what was happening on the systems, but both
>> > are port building machines using ports-mgmt/tinderbox.
>> >
>> > Any ideas or information on how to further debug this would be
>> > appreciated.
>> >
>> These machines were happily running r266621 previously to this update
>> yesterday.  So, that gives us a bisection point.
>>
>
> Some more debug information.  Thank you to Attilio for information on
> what data to get.
>
> Script started on Fri Jun  6 15:00:53 2014
> command: /bin/sh
> # kgdb ./kernel.debug /var/crash/vmcore.0
> [...]
> #0  doadump (textdump=-946873840) at pcpu.h:219
> 219 __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) p/x allproc_lock.sx_lock
> $1 = 0xf813ae7f4924
> Current language:  auto; currently minimal
> (kgdb) p ((struct thread *)0xf813ae7f4924)

The actual thread address is: 0xf813ae7f4920.

Then look at the GDB threads list and match with the tid.

Attilio
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic in deadlkres() on r267110

2014-06-06 Thread Glen Barber
On Fri, Jun 06, 2014 at 11:06:40AM -0400, Glen Barber wrote:
> On Fri, Jun 06, 2014 at 07:23:49AM -0700, Sean Bruno wrote:
> > On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
> > > Two machines in the cluster panic last night with the same backtrace.
> > > It is unclear yet exactly what was happening on the systems, but both
> > > are port building machines using ports-mgmt/tinderbox.
> > > 
> > > Any ideas or information on how to further debug this would be
> > > appreciated.
> > > 
> > These machines were happily running r266621 previously to this update
> > yesterday.  So, that gives us a bisection point.
> > 
> 
> Some more debug information.  Thank you to Attilio for information on
> what data to get.
> 

Sorry, I did not mask the bits properly.

Script started on Fri Jun  6 15:20:14 2014
command: /bin/sh
# kgdb ./kernel.debug /var/crash/vmcore.0
[...]
#0  doadump (textdump=-946873840) at pcpu.h:219
219 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) p/x allproc_lock.sx_lock
$1 = 0xf813ae7f4924
Current language:  auto; currently minimal
(kgdb) p ((struct thread *)0xf813ae7f4920)
$4 = (struct thread *) 0xf813ae7f4920
(kgdb) p $4->td_tid
$5 = 195038
(kgdb) tid 195038
[Switching to thread 94414 (Thread 195038)]#0  0x80daf1f8 in 
cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432
1432savectx(&stoppcbs[cpu]);
(kgdb) bt
#0  0x80daf1f8 in cpustop_handler ()
at /usr/src/sys/amd64/amd64/mp_machdep.c:1432
#1  0x80daf1bf in ipi_nmi_handler ()
at /usr/src/sys/amd64/amd64/mp_machdep.c:1417
#2  0x80dbfcba in trap (frame=0xfe17d5389f30)
at /usr/src/sys/amd64/amd64/trap.c:189
#3  0x80da2e03 in nmi_calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:504
#4  0x8092fa39 in fork1 (td=0xf813ae7f4920, flags=20, pages=Cannot 
access memory at address 0x4
)
at /usr/src/sys/kern/kern_fork.c:268
#5  0x8092f00f in sys_fork (td=0xf813ae7f4920, 
uap=) at /usr/src/sys/kern/kern_fork.c:106
#6  0x80dc118b in amd64_syscall (td=0xf813ae7f4920, traced=0)
at subr_syscall.c:133
#7  0x80da2cab in Xfast_syscall ()
at /usr/src/sys/amd64/amd64/exception.S:390
#8  0x000800d46d1a in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) 

Glen



pgpI1kaMzAPLk.pgp
Description: PGP signature


Re: panic in deadlkres() on r267110

2014-06-06 Thread Ryan Stone
On Fri, Jun 6, 2014 at 11:33 AM, Glen Barber  wrote:
> #4  0x8092fa39 in fork1 (td=0xf813ae7f4920, flags=20, 
> pages=Cannot access memory at address 0x4
> )
> at /usr/src/sys/kern/kern_fork.c:268

This looks like a suspicious piece of code.  Could something be
causing the kernel to go into an infinite loop there?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic in deadlkres() on r267110

2014-06-06 Thread Sean Bruno
On Fri, 2014-06-06 at 10:12 -0400, Glen Barber wrote:
> Two machines in the cluster panic last night with the same backtrace.
> It is unclear yet exactly what was happening on the systems, but both
> are port building machines using ports-mgmt/tinderbox.
> 
> Any ideas or information on how to further debug this would be
> appreciated.
> 
> Script started on Fri Jun  6 14:01:53 2014
> command: /bin/sh
> # uname -a
> FreeBSD redbuild04.nyi.freebsd.org 11.0-CURRENT FreeBSD 11.0-CURRENT #1 
> r267110: Thu Jun  5 15:57:43 UTC 2014 
> sbr...@redbuild04.nyi.freebsd.org:/usr/obj/usr/src/sys/REDBUILD  amd64
> # kgdb ./kernel.debug /var/crash/vmcore.0
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> panic: deadlkres: possible deadlock detected on allproc_lock
> 
> cpuid = 17
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe1838702a20
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe1838702ad0
> panic() at panic+0x155/frame 0xfe1838702b50
> deadlkres() at deadlkres+0x42a/frame 0xfe1838702bb0
> fork_exit() at fork_exit+0x9a/frame 0xfe1838702bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe1838702bf0
> --- trap 0, rip = 0, rsp = 0xfe1838702cb0, rbp = 0 ---
> KDB: enter: panic
> 
> Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/zfs.ko.symbols
> Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> Loaded symbols for /boot/kernel/opensolaris.ko.symbols
> Reading symbols from /boot/kernel/ums.ko.symbols...done.
> Loaded symbols for /boot/kernel/ums.ko.symbols
> Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/linprocfs.ko.symbols
> Reading symbols from /boot/kernel/linux.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux.ko.symbols
> #0  doadump (textdump=-946873840) at pcpu.h:219
> 219 __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) bt
> #0  doadump (textdump=-946873840) at pcpu.h:219
> #1  0x8034e865 in db_fncall (dummy1=, 
> dummy2=, dummy3=, 
> dummy4=) at /usr/src/sys/ddb/db_command.c:578
> #2  0x8034e54d in db_command (cmd_table=0x0)
> at /usr/src/sys/ddb/db_command.c:449
> #3  0x8034e2c4 in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:502
> #4  0x80350d20 in db_trap (type=, code=0)
> at /usr/src/sys/ddb/db_main.c:231
> #5  0x809a9bd9 in kdb_trap (type=3, code=0, tf=)
> at /usr/src/sys/kern/subr_kdb.c:656
> #6  0x80dc00e3 in trap (frame=0xfe1838702a00)
> at /usr/src/sys/amd64/amd64/trap.c:551
> #7  0x80da29c2 in calltrap ()
> at /usr/src/sys/amd64/amd64/exception.S:231
> #8  0x809a933e in kdb_enter (why=0x81039a72 "panic", 
> msg=) at cpufunc.h:63
> #9  0x8096a8b5 in panic (fmt=)
> at /usr/src/sys/kern/kern_shutdown.c:749
> #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
> #11 0x8093170a in fork_exit (callout=0x8090cd40 , 
> arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977
> ---Type  to continue, or q  to quit---
> #12 0x80da2efe in fork_trampoline ()
> at /usr/src/sys/amd64/amd64/exception.S:605
> #13 0x in ?? ()
> Current language:  auto; currently minimal
> (kgdb) fr 10
> #10 0x8090d16a in deadlkres () at /usr/src/sys/kern/kern_clock.c:203
> 203 panic("%s: possible deadlock detected on 
> allproc_lock\n",
> (kgdb) l
> 198  * priority inversion problem leading to starvation.
> 199  * If the lock can't be held after 100 tries, panic.
> 200  */
> 201 if (!sx_try_slock(&allproc_lock)) {
> 202 if (tryl > 100)
> 203 panic("%s: possible deadlock detected on 
> allproc_lock\n",
> 204 __func__);
> 205 tryl++;
> 206 pause("allproc", sleepfreq * hz);
> 207 continue;
> (kgdb) up
> #11 0x8093170a in fork_exit (callout=0x8090cd40 , 
> arg=0x0, frame=0xfe1838702c00) at /usr/src/sys/kern/kern_fork.c:977
> 977 callout(arg, frame);
> (kgdb) l
> 972  * cpu_set_fork_handler intercepts this function call to
> 973  * have this call a non-return function to stay in kernel 
> mode.
> 974  * initproc has its own fork handler, but it does return.
> 975