Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-25 Thread John Baldwin
On Thursday 25 September 2008 01:34:06 am Jeff Wheelhouse wrote:
 
 On Sep 24, 2008, at 12:34 PM, John Baldwin wrote:
 
  On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote:
  panic: lockmgr: thread 0xff0050858350, not exclusive lock holder
  0xff00074959f0 unlocking
  cpuid = 0
  KDB: stack backtrace:
  db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
  panic() at panic+0x17a
  _lockmgr() at _lockmgr+0x872
  VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
  null_unlock() at null_unlock+0xff
  VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
  nullfs_mount() at nullfs_mount+0x244
  vfs_donmount() at vfs_donmount+0xe4d
  nmount() at nmount+0xa5
  syscall() at syscall+0x254
  Xfast_syscall() at Xfast_syscall+0xab
  --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp =
  0x7fffdfc8, rbp = 0x7fffdfd0 ---
 
  Can you use gdb or the like to get the souce file/line for the
  nullfs_mount+0x244 frame?
 
 Got it again, this time with the full debug kernel, and I'm getting  
 the same weird results from gdb, so I'll go ahead and post it:
 
 panic: lockmgr: thread 0xff0003e499f0, not exclusive lock holder  
 0xff000a5e16a0 unlocking
 cpuid = 0
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 panic() at panic+0x17a
 _lockmgr() at _lockmgr+0x872
 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
 null_unlock() at null_unlock+0xff
 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
 nullfs_mount() at nullfs_mount+0x244
 vfs_donmount() at vfs_donmount+0xe4d
 nmount() at nmount+0xa5
 syscall() at syscall+0x254
 Xfast_syscall() at Xfast_syscall+0xab
 --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp =  
 0x7fffe1c8, rbp = 0x7fffe1d0 ---
 
 $ gdb /boot/kernel/nullfs.ko
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and  
 you are
 welcome to change it and/or distribute copies of it under certain  
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for  
 details.
 This GDB was configured as amd64-marcel-freebsd...
 (gdb) l *nullfs_mount+0x244
 0x9c4 is in nullfs_mount (namei.h:163).
 158   struct thread *td)
 159   {
 160   ndp-ni_cnd.cn_nameiop = op;
 161   ndp-ni_cnd.cn_flags = flags;
 162   ndp-ni_segflg = segflg;
 163   ndp-ni_dirp = namep;
 164   ndp-ni_cnd.cn_thread = td;
 165   }
 166   
 167   #define NDF_NO_DVP_RELE 0x0001
 (gdb)
 
 (That's NDINIT(), but line 163 doesn't look like it belongs in the  
 middle of a call stack.  There's a VOP_UNLOCK a few lines above  
 NDINIT() in mount_nullfs(), and another one some ways farther on in  
 the function.)

It's probably the one just before the NDINIT (note that the return address in 
the call stack is pointing to the next instruction to be executed after the 
call to VOP_UNLOCK(), so sometimes it can end up referring to the next line 
in the source code from the actual function call):

if ((mp-mnt_vnodecovered-v_op == null_vnodeops) 
VOP_ISLOCKED(mp-mnt_vnodecovered)) {
VOP_UNLOCK(mp-mnt_vnodecovered, 0);
isvnunlocked = 1;
}
/*
 * Find lower node
 */
NDINIT(ndp, LOOKUP, FOLLOW|LOCKLEAF,
UIO_SYSSPACE, target, td);
error = namei(ndp);

Can you 'p *mp'?  I'm curious if mp-mnt_vnodecovered is NULL (in which case, 
why didn't the two tests in the if() fail?)

 The good news is we took this particular machine out of production and  
 came up with a synthetic test based on our in-house code that can  
 probably reliably reproduce this within a few minutes.  As you might  
 expect, the test involves hammering the same nullfs mount point with  
 mounts and umounts from multiple processes without any external  
 synchronization.

Ok.  Reproducibility is good. :)

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-25 Thread Jeff Wheelhouse


On Sep 25, 2008, at 8:45 AM, John Baldwin wrote:
It's probably the one just before the NDINIT (note that the return  
address in
the call stack is pointing to the next instruction to be executed  
after the
call to VOP_UNLOCK(), so sometimes it can end up referring to the  
next line

in the source code from the actual function call):


Seems like we're six or seven lines of source down, not on the next  
line, which was the source of my confusion.  But if you're not  
confused, I won't be. :)


Can you 'p *mp'?  I'm curious if mp-mnt_vnodecovered is NULL (in  
which case,

why didn't the two tests in the if() fail?)


Apparently I can't; we're stuck with DDB since we can't get a crash  
dump and the serial console goes to a hardware terminal server.  I'm  
afraid I'm not quite clever enough to find the right data structure  
without symbols.


I could try to throw a printf in there, or add a panic if mp- 
mt_vnodecovered is NULL, if you think that would help.  The printf  
will probably significantly alter timings, so I might need some  
guidance as far as what to print, and under what conditions.


Thanks,
Jeff

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-25 Thread John Baldwin
On Thursday 25 September 2008 03:29:20 pm Jeff Wheelhouse wrote:
 
 On Sep 25, 2008, at 8:45 AM, John Baldwin wrote:
  It's probably the one just before the NDINIT (note that the return  
  address in
  the call stack is pointing to the next instruction to be executed  
  after the
  call to VOP_UNLOCK(), so sometimes it can end up referring to the  
  next line
  in the source code from the actual function call):
 
 Seems like we're six or seven lines of source down, not on the next  
 line, which was the source of my confusion.  But if you're not  
 confused, I won't be. :)
 
  Can you 'p *mp'?  I'm curious if mp-mnt_vnodecovered is NULL (in  
  which case,
  why didn't the two tests in the if() fail?)
 
 Apparently I can't; we're stuck with DDB since we can't get a crash  
 dump and the serial console goes to a hardware terminal server.  I'm  
 afraid I'm not quite clever enough to find the right data structure  
 without symbols.
 
 I could try to throw a printf in there, or add a panic if mp- 
  mt_vnodecovered is NULL, if you think that would help.  The printf  
 will probably significantly alter timings, so I might need some  
 guidance as far as what to print, and under what conditions.

You can use KTR instead of printf perhaps and then use 'show ktr' from DDB.  
This won't have the same impact on timing as printf().  I would include PIDs 
in any KTR traces you do so it's easier to parse the interleaved entries from 
multiple CPUs.  Also, if you have a good test case, it might be worth 
grabbing a box w/o gmirror that can generate a crashdump and reproduce it 
there.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-25 Thread Jeff Wheelhouse


On Sep 25, 2008, at 3:53 PM, John Baldwin wrote:
You can use KTR instead of printf perhaps and then use 'show ktr'  
from DDB.
This won't have the same impact on timing as printf().  I would  
include PIDs
in any KTR traces you do so it's easier to parse the interleaved  
entries from

multiple CPUs.


OK, while I am educating myself about how KTR works, what would you  
like to see?  Just mp-mnt_vnodecovered?



 Also, if you have a good test case, it might be worth
grabbing a box w/o gmirror that can generate a crashdump and  
reproduce it

there.


Not an option for us right now; spare 8-core boxes are hard to come  
by.  We're looking for a USB hard drive or something we can dump to.


Thanks,
Jeff

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-25 Thread Antony Mawer

Jeff Wheelhouse wrote:

 Also, if you have a good test case, it might be worth
grabbing a box w/o gmirror that can generate a crashdump and reproduce it
there.


Not an option for us right now; spare 8-core boxes are hard to come by.  
We're looking for a USB hard drive or something we can dump to.


Can you set your dump device to the underlying GEOM component's swap 
partition rather than to the gmirror device...?


-- Antony
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-24 Thread Jeff Wheelhouse


We got the same panic again, this time after switching to the ULE  
scheduler:


panic: lockmgr: thread 0xff0050858350, not exclusive lock holder  
0xff00074959f0 unlocking

cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x17a
_lockmgr() at _lockmgr+0x872
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
null_unlock() at null_unlock+0xff
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
nullfs_mount() at nullfs_mount+0x244
vfs_donmount() at vfs_donmount+0xe4d
nmount() at nmount+0xa5
syscall() at syscall+0x254
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp =  
0x7fffdfc8, rbp = 0x7fffdfd0 ---


Thanks,
Jeff

On Sep 23, 2008, at 11:51 AM, Jeff Wheelhouse wrote:



Got the following panic overnight:

panic: lockmgr: thread 0xff0053cda680, not exclusive lock holder  
0xff002d7da680 unlocking

cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x17a
_lockmgr() at _lockmgr+0x872
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
null_unlock() at null_unlock+0xff
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
nullfs_mount() at nullfs_mount+0x244
vfs_donmount() at vfs_donmount+0xe4d
nmount() at nmount+0xa5
syscall() at syscall+0x254
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp =  
0x7fffdfb8, rbp = 0x7fffdfc0 ---


I've done some searches and not exclusive lock holder has been  
seen before, but I didn't find any previous reports related to  
nullfs with a stack trace at all like this on FreeBSD 7.


This machine is diskless and thus cannot store a kernel dump.  Ideas/ 
suggestions for fixes, causes or debugging steps?


The kernel is amd64, with config shown below.

Thanks,
Jeff

include GENERIC

device  carp
device  pf
device  pflog
device  pfsync

options SW_WATCHDOG
options DEVICE_POLLING

options ALTQ
options ALTQ_CBQ
options ALTQ_RED
options ALTQ_RIO
options ALTQ_HFSC
options ALTQ_PRIQ
options ALTQ_NOPCC

options KDB
options KDB_UNATTENDED
options KDB_TRACE
options DDB
options BREAK_TO_DEBUGGER


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED] 



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-24 Thread John Baldwin
On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote:
 
 We got the same panic again, this time after switching to the ULE  
 scheduler:
 
 panic: lockmgr: thread 0xff0050858350, not exclusive lock holder  
 0xff00074959f0 unlocking
 cpuid = 0
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 panic() at panic+0x17a
 _lockmgr() at _lockmgr+0x872
 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
 null_unlock() at null_unlock+0xff
 VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
 nullfs_mount() at nullfs_mount+0x244
 vfs_donmount() at vfs_donmount+0xe4d
 nmount() at nmount+0xa5
 syscall() at syscall+0x254
 Xfast_syscall() at Xfast_syscall+0xab
 --- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp =  
 0x7fffdfc8, rbp = 0x7fffdfd0 ---

Can you use gdb or the like to get the souce file/line for the 
nullfs_mount+0x244 frame?

i.e. 'gdb /boot/kernel/kernel'

(gdb) l *nullfs_mount+0x244

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-24 Thread Jeff Wheelhouse

On Sep 24, 2008, at 12:34 PM, John Baldwin wrote:

On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote:

nullfs_mount() at nullfs_mount+0x244


Can you use gdb or the like to get the souce file/line for the
nullfs_mount+0x244 frame?

i.e. 'gdb /boot/kernel/kernel'

(gdb) l *nullfs_mount+0x244


The running kernel did not have -g so I added it to the same config  
and rebuilt.  I will slip in a reboot ASAP and post more info after  
the next panic.


Thanks for taking a look!

Thanks,
Jeff

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-24 Thread John Baldwin
On Wednesday 24 September 2008 01:35:44 pm Jeff Wheelhouse wrote:
 On Sep 24, 2008, at 12:34 PM, John Baldwin wrote:
  On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote:
  nullfs_mount() at nullfs_mount+0x244
 
  Can you use gdb or the like to get the souce file/line for the
  nullfs_mount+0x244 frame?
 
  i.e. 'gdb /boot/kernel/kernel'
 
  (gdb) l *nullfs_mount+0x244
 
 The running kernel did not have -g so I added it to the same config  
 and rebuilt.  I will slip in a reboot ASAP and post more info after  
 the next panic.
 
 Thanks for taking a look!

If possible, get a crashdump.

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-24 Thread Jeff Wheelhouse


On Sep 24, 2008, at 2:10 PM, John Baldwin wrote:

If possible, get a crashdump.


gmirror. :(

Thanks,
Jeff

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-24 Thread John Baldwin
On Wednesday 24 September 2008 02:15:59 pm Jeff Wheelhouse wrote:
 
 On Sep 24, 2008, at 2:10 PM, John Baldwin wrote:
  If possible, get a crashdump.
 
 gmirror. :(

Gah.  Make pjd@ fix crashdumps on that. :P

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-24 Thread Jeff Wheelhouse


On Sep 24, 2008, at 12:34 PM, John Baldwin wrote:


On Wednesday 24 September 2008 12:17:56 pm Jeff Wheelhouse wrote:

panic: lockmgr: thread 0xff0050858350, not exclusive lock holder
0xff00074959f0 unlocking
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x17a
_lockmgr() at _lockmgr+0x872
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
null_unlock() at null_unlock+0xff
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
nullfs_mount() at nullfs_mount+0x244
vfs_donmount() at vfs_donmount+0xe4d
nmount() at nmount+0xa5
syscall() at syscall+0x254
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp =
0x7fffdfc8, rbp = 0x7fffdfd0 ---


Can you use gdb or the like to get the souce file/line for the
nullfs_mount+0x244 frame?


Got it again, this time with the full debug kernel, and I'm getting  
the same weird results from gdb, so I'll go ahead and post it:


panic: lockmgr: thread 0xff0003e499f0, not exclusive lock holder  
0xff000a5e16a0 unlocking

cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x17a
_lockmgr() at _lockmgr+0x872
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
null_unlock() at null_unlock+0xff
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
nullfs_mount() at nullfs_mount+0x244
vfs_donmount() at vfs_donmount+0xe4d
nmount() at nmount+0xa5
syscall() at syscall+0x254
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp =  
0x7fffe1c8, rbp = 0x7fffe1d0 ---


$ gdb /boot/kernel/nullfs.ko
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and  
you are
welcome to change it and/or distribute copies of it under certain  
conditions.

Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for  
details.

This GDB was configured as amd64-marcel-freebsd...
(gdb) l *nullfs_mount+0x244
0x9c4 is in nullfs_mount (namei.h:163).
158 struct thread *td)
159 {
160 ndp-ni_cnd.cn_nameiop = op;
161 ndp-ni_cnd.cn_flags = flags;
162 ndp-ni_segflg = segflg;
163 ndp-ni_dirp = namep;
164 ndp-ni_cnd.cn_thread = td;
165 }
166 
167 #define NDF_NO_DVP_RELE 0x0001
(gdb)

(That's NDINIT(), but line 163 doesn't look like it belongs in the  
middle of a call stack.  There's a VOP_UNLOCK a few lines above  
NDINIT() in mount_nullfs(), and another one some ways farther on in  
the function.)


The good news is we took this particular machine out of production and  
came up with a synthetic test based on our in-house code that can  
probably reliably reproduce this within a few minutes.  As you might  
expect, the test involves hammering the same nullfs mount point with  
mounts and umounts from multiple processes without any external  
synchronization.


Thanks,
Jeff

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


panic: lockmgr on FreeBSD 7.0-RELEASE-p4 amd64

2008-09-23 Thread Jeff Wheelhouse


Got the following panic overnight:

panic: lockmgr: thread 0xff0053cda680, not exclusive lock holder  
0xff002d7da680 unlocking

cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x17a
_lockmgr() at _lockmgr+0x872
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
null_unlock() at null_unlock+0xff
VOP_UNLOCK_APV() at VOP_UNLOCK_APV+0x46
nullfs_mount() at nullfs_mount+0x244
vfs_donmount() at vfs_donmount+0xe4d
nmount() at nmount+0xa5
syscall() at syscall+0x254
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x206845ac, rsp =  
0x7fffdfb8, rbp = 0x7fffdfc0 ---


I've done some searches and not exclusive lock holder has been seen  
before, but I didn't find any previous reports related to nullfs with  
a stack trace at all like this on FreeBSD 7.


This machine is diskless and thus cannot store a kernel dump.  Ideas/ 
suggestions for fixes, causes or debugging steps?


The kernel is amd64, with config shown below.

Thanks,
Jeff

include GENERIC

device  carp
device  pf
device  pflog
device  pfsync

options SW_WATCHDOG
options DEVICE_POLLING

options ALTQ
options ALTQ_CBQ
options ALTQ_RED
options ALTQ_RIO
options ALTQ_HFSC
options ALTQ_PRIQ
options ALTQ_NOPCC

options KDB
options KDB_UNATTENDED
options KDB_TRACE
options DDB
options BREAK_TO_DEBUGGER


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]