On 09/18/06 10:02, Eric Anderson wrote:
Hi all,

On one of our NFS servers, we've seen repeated filesystem issues with two of the filesystems (it has 4 exported via NFS). It usually manifests itself by a hung 'df -lk' (wedged in 'ufs'), and mountd becomes wedged also, not allowing new mounts, and unable to be killed. From an NFS client, one can continue using the filesystem just fine, without an issue. From the server itself, you can cd to the filesystem's root directory, but an ls will hang. Running a background fsck on that filesystem while in this state also blocks on ufs. My nfsd processes with also get stuck in the 'D' state (in 'ufs'), but they still appear to be serving data. About a month ago, I brought the system down, did a full fsck on all the filesystems, and brought it back up. It survived for several weeks (2-3), but is now doing the same thing, so I'm uncertain if the issue was affected by the fsck at all (doubtful).

This morning, prior to rebooting the system to get it out of this state, I began unmounting filesystems in case of a panic, and after unmounting (successfully) two of the filesystems (the ones I've never seen an issue on), I tried unmounting the third (/scr02), and a panic ensued. /scr01 is the other filesystem that is giving me issues.

Some information about the system/setup:

FreeBSD smd2.centtech.com 6.1-STABLE FreeBSD 6.1-STABLE #0: Sat Aug 12 13:24:02 CDT 2006

# df -ilk
Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on /dev/amrd0s1a 20308398 3098864 15584864 17% 259261 2378561 10% / devfs 1 1 0 100% 0 0 100% /dev /dev/amrd0s1d 13065232 3960250 8059764 33% 870 1694872 0% /var /dev/ufs/rss 213268540 93886480 102320578 48% 399297 27180093 1% /rss /dev/ufs/scr02 213268540 116904962 79302096 60% 426573 27152817 2% /scr02 /dev/ufs/scr04 167568544 93374026 60789036 61% 13008 21654830 0% /scr04 /dev/ufs/scr01 232100360 161547746 51984586 76% 531834 29473412 2% /scr01

(rss and scr04 never give me any issues)

All four of the ufs/* partitions are on the same RAID array, and I don't believe there is any underlying disk issue.

Here's some kgdb output from when the system was wedged on /scr01, but the unmount of /scr02 caused a panic:

# kgdb -q -n 3
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]

Unread portion of the kernel message buffer:
Mount point /scr02 had 1 dangling refs
panic: unmount: dangling vnode
cpuid = 0
KDB: enter: panic
Dumping 1023 MB (2 chunks)
   chunk 0: 1MB (159 pages) ... ok
chunk 1: 1023MB (261824 pages) 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 5 59 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165     pcpu.h: No such file or directory.
         in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:165
#1 0xc0473b9b in db_fncall (dummy1=-1063129632, dummy2=0, dummy3=-1064859081, dummy4=0xe8de3ab8 "ä:Þè\234l\207ÀÐ:ÞèÔ:Þè\220\a")
     at /usr/src/sys/ddb/db_command.c:492
#2 0xc04739a0 in db_command (last_cmdp=0xc09d0144, cmd_table=0x0, aux_cmd_tablep=0xc092fe4c, aux_cmd_tablep_end=0xc092fe68)
     at /usr/src/sys/ddb/db_command.c:350
#3  0xc0473a68 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#4  0xc0475679 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5 0xc0697c0c in kdb_trap (type=3, code=0, tf=0xe8de3bfc) at /usr/src/sys/kern/subr_kdb.c:473
#6  0xc0896338 in trap (frame=
{tf_fs = -388104184, tf_es = -1066860504, tf_ds = -1064304600, tf_edi = -1064235220, tf_esi = 1, tf_ebp = -388088772, tf_isp = -388088792, tf _ebx = -388088728, tf_edx = 0, tf_ecx = -1056755712, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1066829453, tf_cs = 32, tf_eflags = 646, tf_
esp = -388088740, tf_ss = -1066934521}) at /usr/src/sys/i386/i386/trap.c:593
#7  0xc0881e5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#8 0xc0697973 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>) at cpufunc.h:60 #9 0xc067df07 in panic (fmt=0xc0910f2c "unmount: dangling vnode") at /usr/src/sys/kern/kern_shutdown.c:549 #10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:514 #11 0xc06d2d26 in dounmount (mp=0xc5964000, flags=134217728, td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:1162 #12 0xc06d27de in unmount (td=0xc620c600, uap=0xe8de3d04) at /usr/src/sys/kern/vfs_mount.c:1052
#13 0xc0896c0b in syscall (frame=
{tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134521957, tf_esi = 134535289, tf_ebp = -1077942776, tf_isp = -388088476, tf_ebx = -1077942864, tf_edx = 26, tf_ecx = 0, tf_eax = 22, tf_trapno = 12, tf_err = 2, tf_eip = 671864503, tf_cs = 51, tf_eflags = 518, tf_esp = -1077942948, tf_ss = 5
9}) at /usr/src/sys/i386/i386/trap.c:981
#14 0xc0881eaf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#15 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 10
#10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:514
514                     panic("unmount: dangling vnode");
(kgdb) l
509 printf("mount point secondary write ops completed\n");
510             }
511             MNT_IUNLOCK(mp);
512             mp->mnt_vfc->vfc_refcount--;
513             if (!TAILQ_EMPTY(&mp->mnt_nvnodelist))
514                     panic("unmount: dangling vnode");
515             lockdestroy(&mp->mnt_lock);
516             MNT_ILOCK(mp);
517             if (mp->mnt_kern_flag & MNTK_MWAIT)
518                     wakeup(mp);

(kgdb) p *mp
$2 = {mnt_list = {tqe_next = 0xc5964400, tqe_prev = 0xc59bbc00}, mnt_op = 0xc09b96e0, mnt_vfc = 0xc09b9720, mnt_vnodecovered = 0xc5ae0cc0, mnt_syncer = 0x0, mnt_nvnodelist = {tqh_first = 0xc6d59440, tqh_last = 0xc6d59454}, mnt_lock = {lk_interlock = 0xc09eac84, lk_flags = 1048576, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 80, lk_wmesg = 0xc0910dff "vfslock", lk_timo = 0, lk_lockholder = 0xffffffff, lk_newlock = 0x0}, mnt_mtx = {mtx_object = {lo_class = 0xc0980124, lo_name = 0xc0910dee "struct mount mtx", lo_type = 0xc0910dee "struct mount mtx", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, mnt_writeopcount = 0, mnt_flag = 2097920, mnt_opt = 0xc5926a00, mnt_optnew = 0x0, mnt_kern_flag = 553648128, mnt_maxsymlinklen = 120, mnt_stat = {f_version = 537068824, f_type = 5, f_flags = 2102016, f_bsize = 2048, f_iosize = 16384, f_blocks = 106634270, f_bfree = 48180134, f_bavail = 39649393, f_files = 27579390, f_ffree = 27152822, f_syncwrites = 0, f_asyncwrites = 0, f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {1111508928, -571478071}}, f_charspare = '\0' <repeats 79 times>, f_fstypename = "ufs", '\0' <repeats 12 times>, f_mntfromname = "/dev/ufs/scr02", '\0' <repeats 73 times>, f_mntonname = "/scr02", '\0' <repeats 81 times>}, mnt_cred = 0xc59f2080, mnt_data = 0x0, mnt_time = 0, mnt_iosize_max = 131072, mnt_export = 0xc5d25c00, mnt_mntlabel = 0x0, mnt_fslabel = 0x0, mnt_nvnodelistsize = 1, mnt_hashseed = 3369618744, mnt_markercnt = 0, mnt_holdcnt = 0, mnt_holdcntwaiters = 0, mnt_secondary_writes = 0,
   mnt_secondary_accwrites = 2126786, mnt_ref = 1}
(kgdb) p mp->mnt_vfc->vfc_refcount
$3 = 4


Anything else I can provide to help find the issue?


Eric





Another batch of kgdb output from this same system, with the same issue:

# kgdb -q -n 1
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]

Unread portion of the kernel message buffer:
Mount point /rss had 1 dangling refs
panic: unmount: dangling vnode
cpuid = 0
KDB: enter: panic
Dumping 1023 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
chunk 1: 1023MB (261824 pages) 1007 991 975 959 943 927 911 895 879 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1 0xc0473b9b in db_fncall (dummy1=-1063129632, dummy2=0, dummy3=-1064859081, dummy4=0xe8e65ab8 "äZæè\234l\207ÀÐZæèÔZæè\220\a")
    at /usr/src/sys/ddb/db_command.c:492
#2 0xc04739a0 in db_command (last_cmdp=0xc09d0144, cmd_table=0x0, aux_cmd_tablep=0xc092fe4c, aux_cmd_tablep_end=0xc092fe68)
    at /usr/src/sys/ddb/db_command.c:350
#3  0xc0473a68 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#4  0xc0475679 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5 0xc0697c0c in kdb_trap (type=3, code=0, tf=0xe8e65bfc) at /usr/src/sys/kern/subr_kdb.c:473
#6  0xc0896338 in trap (frame=
{tf_fs = -387579896, tf_es = -1066860504, tf_ds = -1064304600, tf_edi = -1064235220, tf_esi = 1, tf_ebp = -387556292, tf_isp = -387556312, tf_ebx = -387556248, tf_edx = 0, tf_ecx = -1056755712, tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1066829453, tf_cs = 32, tf_eflags = 646, tf_esp = -387556260, tf_ss = -1066934521}) at /usr/src/sys/i386/i386/trap.c:593
#7  0xc0881e5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#8 0xc0697973 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>) at cpufunc.h:60 #9 0xc067df07 in panic (fmt=0xc0910f2c "unmount: dangling vnode") at /usr/src/sys/kern/kern_shutdown.c:549 #10 0xc06d153e in vfs_mount_destroy (mp=0xc59bbc00, td=0xc5c16000) at /usr/src/sys/kern/vfs_mount.c:514 #11 0xc06d2d26 in dounmount (mp=0xc59bbc00, flags=134217728, td=0xc5c16000) at /usr/src/sys/kern/vfs_mount.c:1162 #12 0xc06d27de in unmount (td=0xc5c16000, uap=0xe8e65d04) at /usr/src/sys/kern/vfs_mount.c:1052
#13 0xc0896c0b in syscall (frame=
{tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134521957, tf_esi = 134534817, tf_ebp = -1077942776, tf_isp = -387555996, tf_ebx = -1077942864, tf_edx = 25, tf_ecx = 0, tf_eax = 22, tf_trapno = 12, tf_err = 2, tf_eip = 671864503, tf_cs = 51, tf_eflags = 518, tf_esp = -1077942948, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:981 #14 0xc0881eaf in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#15 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 10
#10 0xc06d153e in vfs_mount_destroy (mp=0xc59bbc00, td=0xc5c16000) at /usr/src/sys/kern/vfs_mount.c:514
514                     panic("unmount: dangling vnode");
(kgdb) l
509 printf("mount point secondary write ops completed\n");
510             }
511             MNT_IUNLOCK(mp);
512             mp->mnt_vfc->vfc_refcount--;
513             if (!TAILQ_EMPTY(&mp->mnt_nvnodelist))
514                     panic("unmount: dangling vnode");
515             lockdestroy(&mp->mnt_lock);
516             MNT_ILOCK(mp);
517             if (mp->mnt_kern_flag & MNTK_MWAIT)
518                     wakeup(mp);
(kgdb) p mp->mnt_vfc->vfc_refcount
$1 = 4
(kgdb) p *mp
$2 = {mnt_list = {tqe_next = 0xc595d000, tqe_prev = 0xc59bc000}, mnt_op = 0xc09b96e0, mnt_vfc = 0xc09b9720, mnt_vnodecovered = 0xc5a81cc0, mnt_syncer = 0x0, mnt_nvnodelist = {tqh_first = 0xc8af4000, tqh_last = 0xc8af4014}, mnt_lock = {lk_interlock = 0xc09eac18, lk_flags = 1048576, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio = 80, lk_wmesg = 0xc0910dff "vfslock", lk_timo = 0, lk_lockholder = 0xffffffff, lk_newlock = 0x0}, mnt_mtx = {mtx_object = {lo_class = 0xc0980124, lo_name = 0xc0910dee "struct mount mtx", lo_type = 0xc0910dee "struct mount mtx", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, mnt_writeopcount = 0, mnt_flag = 2097920, mnt_opt = 0xc5728a40, mnt_optnew = 0x0, mnt_kern_flag = 553648128, mnt_maxsymlinklen = 120, mnt_stat = {f_version = 537068824, f_type = 5, f_flags = 2102016, f_bsize = 2048, f_iosize = 16384, f_blocks = 106634270, f_bfree = 65410962, f_bavail = 56880221, f_files = 27579390, f_ffree = 27203064, f_syncwrites = 0, f_asyncwrites = 0, f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {1111508926, 499625180}}, f_charspare = '\0' <repeats 79 times>, f_fstypename = "ufs", '\0' <repeats 12 times>, f_mntfromname = "/dev/ufs/rss", '\0' <repeats 75 times>, f_mntonname = "/rss", '\0' <repeats 83 times>}, mnt_cred = 0xc5a56c80, mnt_data = 0x0, mnt_time = 0, mnt_iosize_max = 131072, mnt_export = 0xc59e8000, mnt_mntlabel = 0x0, mnt_fslabel = 0x0, mnt_nvnodelistsize = 1, mnt_hashseed = 2115021039, mnt_markercnt = 0, mnt_holdcnt = 0, mnt_holdcntwaiters = 0, mnt_secondary_writes = 0,
  mnt_secondary_accwrites = 12553194, mnt_ref = 1}
(kgdb) p &mp->mnt_nvnodelist
$3 = (struct vnodelst *) 0xc59bbc18
(kgdb) p mp->mnt_nvnodelist
$4 = {tqh_first = 0xc8af4000, tqh_last = 0xc8af4014}


Eric



--
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to