Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Thursday 25 September 2008 01:34:06 am Jeff Wheelhouse wrote: On Sep 24, 2008, at 12:34 PM, John Baldwin wrote: On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote: panic: lockmgr: thread 0xff0050858350, not exclusive lock holder 0xff00074959f0 unlocking cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x872 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 null_unlock() at null_unlock+0xff VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 nullfs_mount() at nullfs_mount+0x244 vfs_donmount() at vfs_donmount+0xe4d nmount() at nmount+0xa5 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = 0x7fffdfc8, rbp = 0x7fffdfd0 --- Can you use gdb or the like to get the souce file/line for the nullfs_mount+0x244 frame? Got it again, this time with the full debug kernel, and I'm getting the same weird results from gdb, so I'll go ahead and post it: panic: lockmgr: thread 0xff0003e499f0, not exclusive lock holder 0xff000a5e16a0 unlocking cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x872 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 null_unlock() at null_unlock+0xff VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 nullfs_mount() at nullfs_mount+0x244 vfs_donmount() at vfs_donmount+0xe4d nmount() at nmount+0xa5 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = 0x7fffe1c8, rbp = 0x7fffe1d0 --- $ gdb /boot/kernel/nullfs.ko GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... (gdb) l *nullfs_mount+0x244 0x9c4 is in nullfs_mount (namei.h:163). 158 struct thread *td) 159 { 160 ndp-ni_cnd.cn_nameiop = op; 161 ndp-ni_cnd.cn_flags = flags; 162 ndp-ni_segflg = segflg; 163 ndp-ni_dirp = namep; 164 ndp-ni_cnd.cn_thread = td; 165 } 166 167 #define NDF_NO_DVP_RELE 0x0001 (gdb) (That's NDINIT(), but line 163 doesn't look like it belongs in the middle of a call stack. There's a VOP_UNLOCK a few lines above NDINIT() in mount_nullfs(), and another one some ways farther on in the function.) It's probably the one just before the NDINIT (note that the return address in the call stack is pointing to the next instruction to be executed after the call to VOP_UNLOCK(), so sometimes it can end up referring to the next line in the source code from the actual function call): if ((mp-mnt_vnodecovered-v_op == null_vnodeops) VOP_ISLOCKED(mp-mnt_vnodecovered)) { VOP_UNLOCK(mp-mnt_vnodecovered, 0); isvnunlocked = 1; } /* * Find lower node */ NDINIT(ndp, LOOKUP, FOLLOW|LOCKLEAF, UIO_SYSSPACE, target, td); error = namei(ndp); Can you 'p *mp'? I'm curious if mp-mnt_vnodecovered is NULL (in which case, why didn't the two tests in the if() fail?) The good news is we took this particular machine out of production and came up with a synthetic test based on our in-house code that can probably reliably reproduce this within a few minutes. As you might expect, the test involves hammering the same nullfs mount point with mounts and umounts from multiple processes without any external synchronization. Ok. Reproducibility is good. :) -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Sep 25, 2008, at 8:45 AM, John Baldwin wrote: It's probably the one just before the NDINIT (note that the return address in the call stack is pointing to the next instruction to be executed after the call to VOP_UNLOCK(), so sometimes it can end up referring to the next line in the source code from the actual function call): Seems like we're six or seven lines of source down, not on the next line, which was the source of my confusion. But if you're not confused, I won't be. :) Can you 'p *mp'? I'm curious if mp-mnt_vnodecovered is NULL (in which case, why didn't the two tests in the if() fail?) Apparently I can't; we're stuck with DDB since we can't get a crash dump and the serial console goes to a hardware terminal server. I'm afraid I'm not quite clever enough to find the right data structure without symbols. I could try to throw a printf in there, or add a panic if mp- mt_vnodecovered is NULL, if you think that would help. The printf will probably significantly alter timings, so I might need some guidance as far as what to print, and under what conditions. Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Thursday 25 September 2008 03:29:20 pm Jeff Wheelhouse wrote: On Sep 25, 2008, at 8:45 AM, John Baldwin wrote: It's probably the one just before the NDINIT (note that the return address in the call stack is pointing to the next instruction to be executed after the call to VOP_UNLOCK(), so sometimes it can end up referring to the next line in the source code from the actual function call): Seems like we're six or seven lines of source down, not on the next line, which was the source of my confusion. But if you're not confused, I won't be. :) Can you 'p *mp'? I'm curious if mp-mnt_vnodecovered is NULL (in which case, why didn't the two tests in the if() fail?) Apparently I can't; we're stuck with DDB since we can't get a crash dump and the serial console goes to a hardware terminal server. I'm afraid I'm not quite clever enough to find the right data structure without symbols. I could try to throw a printf in there, or add a panic if mp- mt_vnodecovered is NULL, if you think that would help. The printf will probably significantly alter timings, so I might need some guidance as far as what to print, and under what conditions. You can use KTR instead of printf perhaps and then use 'show ktr' from DDB. This won't have the same impact on timing as printf(). I would include PIDs in any KTR traces you do so it's easier to parse the interleaved entries from multiple CPUs. Also, if you have a good test case, it might be worth grabbing a box w/o gmirror that can generate a crashdump and reproduce it there. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Sep 25, 2008, at 3:53 PM, John Baldwin wrote: You can use KTR instead of printf perhaps and then use 'show ktr' from DDB. This won't have the same impact on timing as printf(). I would include PIDs in any KTR traces you do so it's easier to parse the interleaved entries from multiple CPUs. OK, while I am educating myself about how KTR works, what would you like to see? Just mp-mnt_vnodecovered? Also, if you have a good test case, it might be worth grabbing a box w/o gmirror that can generate a crashdump and reproduce it there. Not an option for us right now; spare 8-core boxes are hard to come by. We're looking for a USB hard drive or something we can dump to. Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
Jeff Wheelhouse wrote: Also, if you have a good test case, it might be worth grabbing a box w/o gmirror that can generate a crashdump and reproduce it there. Not an option for us right now; spare 8-core boxes are hard to come by. We're looking for a USB hard drive or something we can dump to. Can you set your dump device to the underlying GEOM component's swap partition rather than to the gmirror device...? -- Antony ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
We got the same panic again, this time after switching to the ULE scheduler: panic: lockmgr: thread 0xff0050858350, not exclusive lock holder 0xff00074959f0 unlocking cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x872 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 null_unlock() at null_unlock+0xff VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 nullfs_mount() at nullfs_mount+0x244 vfs_donmount() at vfs_donmount+0xe4d nmount() at nmount+0xa5 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = 0x7fffdfc8, rbp = 0x7fffdfd0 --- Thanks, Jeff On Sep 23, 2008, at 11:51 AM, Jeff Wheelhouse wrote: Got the following panic overnight: panic: lockmgr: thread 0xff0053cda680, not exclusive lock holder 0xff002d7da680 unlocking cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x872 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 null_unlock() at null_unlock+0xff VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 nullfs_mount() at nullfs_mount+0x244 vfs_donmount() at vfs_donmount+0xe4d nmount() at nmount+0xa5 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = 0x7fffdfb8, rbp = 0x7fffdfc0 --- I've done some searches and not exclusive lock holder has been seen before, but I didn't find any previous reports related to nullfs with a stack trace at all like this on FreeBSD 7. This machine is diskless and thus cannot store a kernel dump. Ideas/ suggestions for fixes, causes or debugging steps? The kernel is amd64, with config shown below. Thanks, Jeff include GENERIC device carp device pf device pflog device pfsync options SW_WATCHDOG options DEVICE_POLLING options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_PRIQ options ALTQ_NOPCC options KDB options KDB_UNATTENDED options KDB_TRACE options DDB options BREAK_TO_DEBUGGER ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote: We got the same panic again, this time after switching to the ULE scheduler: panic: lockmgr: thread 0xff0050858350, not exclusive lock holder 0xff00074959f0 unlocking cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x872 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 null_unlock() at null_unlock+0xff VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 nullfs_mount() at nullfs_mount+0x244 vfs_donmount() at vfs_donmount+0xe4d nmount() at nmount+0xa5 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = 0x7fffdfc8, rbp = 0x7fffdfd0 --- Can you use gdb or the like to get the souce file/line for the nullfs_mount+0x244 frame? i.e. 'gdb /boot/kernel/kernel' (gdb) l *nullfs_mount+0x244 -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Sep 24, 2008, at 12:34 PM, John Baldwin wrote: On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote: nullfs_mount() at nullfs_mount+0x244 Can you use gdb or the like to get the souce file/line for the nullfs_mount+0x244 frame? i.e. 'gdb /boot/kernel/kernel' (gdb) l *nullfs_mount+0x244 The running kernel did not have -g so I added it to the same config and rebuilt. I will slip in a reboot ASAP and post more info after the next panic. Thanks for taking a look! Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Wednesday 24 September 2008 01:35:44 pm Jeff Wheelhouse wrote: On Sep 24, 2008, at 12:34 PM, John Baldwin wrote: On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote: nullfs_mount() at nullfs_mount+0x244 Can you use gdb or the like to get the souce file/line for the nullfs_mount+0x244 frame? i.e. 'gdb /boot/kernel/kernel' (gdb) l *nullfs_mount+0x244 The running kernel did not have -g so I added it to the same config and rebuilt. I will slip in a reboot ASAP and post more info after the next panic. Thanks for taking a look! If possible, get a crashdump. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Sep 24, 2008, at 2:10 PM, John Baldwin wrote: If possible, get a crashdump. gmirror. :( Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Wednesday 24 September 2008 02:15:59 pm Jeff Wheelhouse wrote: On Sep 24, 2008, at 2:10 PM, John Baldwin wrote: If possible, get a crashdump. gmirror. :( Gah. Make pjd@ fix crashdumps on that. :P -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
On Sep 24, 2008, at 12:34 PM, John Baldwin wrote: On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote: panic: lockmgr: thread 0xff0050858350, not exclusive lock holder 0xff00074959f0 unlocking cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x872 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 null_unlock() at null_unlock+0xff VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 nullfs_mount() at nullfs_mount+0x244 vfs_donmount() at vfs_donmount+0xe4d nmount() at nmount+0xa5 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = 0x7fffdfc8, rbp = 0x7fffdfd0 --- Can you use gdb or the like to get the souce file/line for the nullfs_mount+0x244 frame? Got it again, this time with the full debug kernel, and I'm getting the same weird results from gdb, so I'll go ahead and post it: panic: lockmgr: thread 0xff0003e499f0, not exclusive lock holder 0xff000a5e16a0 unlocking cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x872 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 null_unlock() at null_unlock+0xff VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 nullfs_mount() at nullfs_mount+0x244 vfs_donmount() at vfs_donmount+0xe4d nmount() at nmount+0xa5 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = 0x7fffe1c8, rbp = 0x7fffe1d0 --- $ gdb /boot/kernel/nullfs.ko GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... (gdb) l *nullfs_mount+0x244 0x9c4 is in nullfs_mount (namei.h:163). 158 struct thread *td) 159 { 160 ndp-ni_cnd.cn_nameiop = op; 161 ndp-ni_cnd.cn_flags = flags; 162 ndp-ni_segflg = segflg; 163 ndp-ni_dirp = namep; 164 ndp-ni_cnd.cn_thread = td; 165 } 166 167 #define NDF_NO_DVP_RELE 0x0001 (gdb) (That's NDINIT(), but line 163 doesn't look like it belongs in the middle of a call stack. There's a VOP_UNLOCK a few lines above NDINIT() in mount_nullfs(), and another one some ways farther on in the function.) The good news is we took this particular machine out of production and came up with a synthetic test based on our in-house code that can probably reliably reproduce this within a few minutes. As you might expect, the test involves hammering the same nullfs mount point with mounts and umounts from multiple processes without any external synchronization. Thanks, Jeff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64
Got the following panic overnight: panic: lockmgr: thread 0xff0053cda680, not exclusive lock holder 0xff002d7da680 unlocking cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a panic() at panic+0x17a _lockmgr() at _lockmgr+0x872 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 null_unlock() at null_unlock+0xff VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46 nullfs_mount() at nullfs_mount+0x244 vfs_donmount() at vfs_donmount+0xe4d nmount() at nmount+0xa5 syscall() at syscall+0x254 Xfast_syscall() at Xfast_syscall+0xab --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp = 0x7fffdfb8, rbp = 0x7fffdfc0 --- I've done some searches and not exclusive lock holder has been seen before, but I didn't find any previous reports related to nullfs with a stack trace at all like this on FreeBSD 7. This machine is diskless and thus cannot store a kernel dump. Ideas/ suggestions for fixes, causes or debugging steps? The kernel is amd64, with config shown below. Thanks, Jeff include GENERIC device carp device pf device pflog device pfsync options SW_WATCHDOG options DEVICE_POLLING options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_PRIQ options ALTQ_NOPCC options KDB options KDB_UNATTENDED options KDB_TRACE options DDB options BREAK_TO_DEBUGGER ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]