Re: NFS ( amd?) dysfunction descending a hierarchy
On Thu, Dec 11, 2008 at 02:53:49PM -0800, David Wolfskill wrote: On Wed, Dec 10, 2008 at 07:06:20PM +0200, Kostik Belousov wrote: ... What concerns me is that even if the attempted unmount gets EBUSY, the user-level process descending the directory hierarchy is getting ENOENT trying to issue fstatfs() against an open file descriptor. I'm having trouble figuring out any way that makes any sense. Basically, the problem is that NFS uses shared lookup, and this allows for the bug where several negative namecache entries are created for non-existent node. Then this node gets created, removing only the first negative namecache entry. For some reasons, vnode is reclaimed; amd' tasting of unmount is a good reason for vnode to be reclaimed. Now, you have existing path and a negative cache entry. This was reported by Peter Holm first, I listed relevant revisions that should fix this in previous mail. Well, I messed up the machine I had been using for testing, and needed to wait for IT to do something to it since I don't have physical or console access to it. So after I happened to demonstrate the effect using my desktop -- which had been running RELENG_7_1, sources updated as of around 0400 hrs. US/Pacific -- I decided to go ahead and update the desktop to RELENG_7_1 as of this morning (which had the commit to sys/kern/vfs_cache.c), then test. It still failed, apparently in the same way; details below. First, here's a list of the files that were changed: U lib/libarchive/archive_read_support_format_iso9660.c U lib/libarchive/archive_string.c U lib/libarchive/archive_string.h U lib/libc/gen/times.3 U lib/libc/i386/sys/pipe.S U lib/libc/i386/sys/reboot.S U lib/libc/i386/sys/setlogin.S U lib/libutil/Makefile U lib/libutil/kinfo_getfile.c U lib/libutil/kinfo_getvmmap.c U lib/libutil/libutil.h U share/man/man4/bce.4 U share/man/man5/Makefile U share/man/man5/fstab.5 U share/man/man5/nullfs.5 U sys/amd64/Makefile U sys/boot/forth/loader.conf.5 U sys/dev/ale/if_ale.c U sys/dev/bce/if_bce.c U sys/dev/cxgb/cxgb_main.c U sys/dev/cxgb/common/cxgb_ael1002.c U sys/dev/cxgb/common/cxgb_t3_hw.c U sys/dev/cxgb/common/cxgb_xgmac.c U sys/dev/re/if_re.c U sys/fs/nullfs/null_vnops.c U sys/kern/Make.tags.inc U sys/kern/kern_descrip.c U sys/kern/kern_proc.c U sys/kern/vfs_cache.c U sys/netinet/in_pcb.h U sys/pci/if_rlreg.h U sys/sys/sysctl.h U sys/sys/user.h U sys/ufs/ufs/ufs_quota.c U usr.bin/procstat/Makefile U usr.bin/procstat/procstat_files.c U usr.bin/procstat/procstat_vm.c U usr.bin/tar/util.c U usr.bin/tar/test/Makefile U usr.bin/tar/test/test_strip_components.c U usr.bin/tar/test/test_symlink_dir.c U usr.bin/xargs/xargs.1 U usr.sbin/mtree/mtree.c We see that sys/kern/vfs_cache.c is, indeed, among them. And: dwolf-bsd(7.1-P)[5] grep '\$FreeBSD' /sys/kern/vfs_cache.c __FBSDID($FreeBSD: src/sys/kern/vfs_cache.c,v 1.114.2.3 2008/12/09 16:20:58 kib Exp $); dwolf-bsd(7.1-P)[6] That should correspond to the desired version of the file. Here we see an excerpt from the ktrace output for the amd(8) process and its children; this is a point when amd(8) is trying an unmount() to see if it can get away with it: 977 amd 1229033597.269612 CALL gettimeofday(0x807ad48,0) 977 amd 1229033597.269620 RET gettimeofday 0 977 amd 1229033597.269630 CALL sigprocmask(SIG_BLOCK,0xbfbfeaec,0xbfbfeadc) 977 amd 1229033597.269637 RET sigprocmask 0 977 amd 1229033597.269645 CALL fork 977 amd 1229033597.273810 RET fork 1712/0x6b0 1712 amd 1229033597.273811 RET fork 0 977 amd 1229033597.273836 CALL sigprocmask(SIG_SETMASK,0xbfbfeadc,0) 1712 amd 1229033597.273845 CALL getpid 977 amd 1229033597.273850 RET sigprocmask 0 1712 amd 1229033597.273855 RET getpid 1712/0x6b0 977 amd 1229033597.273864 CALL gettimeofday(0x807ad48,0) 977 amd 1229033597.273874 RET gettimeofday 0 1712 amd 1229033597.273878 CALL unmount(0x2832c610,invalid0) ... 1712 amd 1229033597.352643 RET unmount -1 errno 16 Device busy 1712 amd 1229033597.352695 CALL sigprocmask(SIG_BLOCK,0x28097c00,0xbfbfea0c) 1712 amd 1229033597.352728 RET sigprocmask 0 1712 amd 1229033597.352751 CALL sigprocmask(SIG_SETMASK,0x28097c10,0) 1712 amd 1229033597.352769 RET sigprocmask 0 1712 amd 1229033597.352781 CALL sigprocmask(SIG_BLOCK,0x28097c00,0xbfbfe9dc) 1712 amd 1229033597.352790 RET sigprocmask 0 1712 amd 1229033597.352801 CALL sigprocmask(SIG_SETMASK,0x28097c10,0) 1712 amd 1229033597.352805 RET sigprocmask 0 1712 amd 1229033597.352815 CALL exit(0x10) 977 amd 1229033597.353085 RET select -1 errno 4 Interrupted system call 977 amd 1229033597.353093 PSIG SIGCHLD caught handler=0x805de50 mask=0x0 code=0x0 977 amd
Re: NFS ( amd?) dysfunction descending a hierarchy
On Fri, Dec 12, 2008 at 03:41:29PM +0200, Kostik Belousov wrote: ... * At 1229033597.287187 it issues an fstatfs() against FD 4; the unsuccessful return is at 1229033597.287195, claiming ENOENT. Say WHAT??!? ... But is this error transient or permanent ? I.e., would restart of rm successful or failing ? In a test yesterday, it took 3 attempts (each attempt being an invocations of rm -fr ~bspace/ports) to actually complete removal of the hierarchy. Please note that: * Done on a locally-mounted file systen (vs. NFS), a single invocation is sufficient and terminates normally. Each of the above-cited attempts but the last terminated with a status code of 1 (as well as a whine that one or more subdirectories was not empty -- this, as a result of rm getting inconsistent information about the status of the file system). * Done on either a locally- or NFS-mounted file system in FreeBSD 6.x, a single invocation is sufficient and terminates normally. In other words, this is a regression. Anyway, this error looks different too. ? From the earlier-posted results in 7.x? Not that I can tell. In each case, the amd(8) child process is forked to attempt an unmount(), tries it, gets EBUSY, and exits. Meanwhile, rm(1) is descending a directory tree. It had performed a readdir(), and had been unlinking files and performing rmdir() against empty subdirectories. It encounters an entry, issues stat(), finds that it's a subdirectory, open()s it, gets an FD, issues fstat(), gets results that match those of the earlier stat(), issues fcntl() against the FD (which returns 0), tries to issue fstatfs() against the FD *that is still open*, and gets told ENOENT. It does differ from the behavior in 8-CURRENT, in that the amd(8) child process in 8-CURRENT does not appear to get EBUSY. The behavior from rm(1)'s perspective is very similar, though. If it would help, I could try getting a ktrace from a 6.x system, but I expect it will be very boring: the amd(8) child process should get EBUSY (as it does in 7.x), and nothing else should happen, since the unmount() attempt failed. And since it failed, rm(1) doesn't get told inconsistent information, so things Just Work. I admit that I'm no expert on VFS or much of the rest of the kernel, for that matter. But what I have observed happening in recent 7.x is both wrong and a regression. Peace, david -- David H. Wolfskill da...@catwhisker.org Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. pgpNoDLDaFr3Z.pgp Description: PGP signature
Re: NFS ( amd?) dysfunction descending a hierarchy
On Wed, Dec 10, 2008 at 07:06:20PM +0200, Kostik Belousov wrote: ... What concerns me is that even if the attempted unmount gets EBUSY, the user-level process descending the directory hierarchy is getting ENOENT trying to issue fstatfs() against an open file descriptor. I'm having trouble figuring out any way that makes any sense. Basically, the problem is that NFS uses shared lookup, and this allows for the bug where several negative namecache entries are created for non-existent node. Then this node gets created, removing only the first negative namecache entry. For some reasons, vnode is reclaimed; amd' tasting of unmount is a good reason for vnode to be reclaimed. Now, you have existing path and a negative cache entry. This was reported by Peter Holm first, I listed relevant revisions that should fix this in previous mail. Well, I messed up the machine I had been using for testing, and needed to wait for IT to do something to it since I don't have physical or console access to it. So after I happened to demonstrate the effect using my desktop -- which had been running RELENG_7_1, sources updated as of around 0400 hrs. US/Pacific -- I decided to go ahead and update the desktop to RELENG_7_1 as of this morning (which had the commit to sys/kern/vfs_cache.c), then test. It still failed, apparently in the same way; details below. First, here's a list of the files that were changed: U lib/libarchive/archive_read_support_format_iso9660.c U lib/libarchive/archive_string.c U lib/libarchive/archive_string.h U lib/libc/gen/times.3 U lib/libc/i386/sys/pipe.S U lib/libc/i386/sys/reboot.S U lib/libc/i386/sys/setlogin.S U lib/libutil/Makefile U lib/libutil/kinfo_getfile.c U lib/libutil/kinfo_getvmmap.c U lib/libutil/libutil.h U share/man/man4/bce.4 U share/man/man5/Makefile U share/man/man5/fstab.5 U share/man/man5/nullfs.5 U sys/amd64/Makefile U sys/boot/forth/loader.conf.5 U sys/dev/ale/if_ale.c U sys/dev/bce/if_bce.c U sys/dev/cxgb/cxgb_main.c U sys/dev/cxgb/common/cxgb_ael1002.c U sys/dev/cxgb/common/cxgb_t3_hw.c U sys/dev/cxgb/common/cxgb_xgmac.c U sys/dev/re/if_re.c U sys/fs/nullfs/null_vnops.c U sys/kern/Make.tags.inc U sys/kern/kern_descrip.c U sys/kern/kern_proc.c U sys/kern/vfs_cache.c U sys/netinet/in_pcb.h U sys/pci/if_rlreg.h U sys/sys/sysctl.h U sys/sys/user.h U sys/ufs/ufs/ufs_quota.c U usr.bin/procstat/Makefile U usr.bin/procstat/procstat_files.c U usr.bin/procstat/procstat_vm.c U usr.bin/tar/util.c U usr.bin/tar/test/Makefile U usr.bin/tar/test/test_strip_components.c U usr.bin/tar/test/test_symlink_dir.c U usr.bin/xargs/xargs.1 U usr.sbin/mtree/mtree.c We see that sys/kern/vfs_cache.c is, indeed, among them. And: dwolf-bsd(7.1-P)[5] grep '\$FreeBSD' /sys/kern/vfs_cache.c __FBSDID($FreeBSD: src/sys/kern/vfs_cache.c,v 1.114.2.3 2008/12/09 16:20:58 kib Exp $); dwolf-bsd(7.1-P)[6] That should correspond to the desired version of the file. Here we see an excerpt from the ktrace output for the amd(8) process and its children; this is a point when amd(8) is trying an unmount() to see if it can get away with it: 977 amd 1229033597.269612 CALL gettimeofday(0x807ad48,0) 977 amd 1229033597.269620 RET gettimeofday 0 977 amd 1229033597.269630 CALL sigprocmask(SIG_BLOCK,0xbfbfeaec,0xbfbfeadc) 977 amd 1229033597.269637 RET sigprocmask 0 977 amd 1229033597.269645 CALL fork 977 amd 1229033597.273810 RET fork 1712/0x6b0 1712 amd 1229033597.273811 RET fork 0 977 amd 1229033597.273836 CALL sigprocmask(SIG_SETMASK,0xbfbfeadc,0) 1712 amd 1229033597.273845 CALL getpid 977 amd 1229033597.273850 RET sigprocmask 0 1712 amd 1229033597.273855 RET getpid 1712/0x6b0 977 amd 1229033597.273864 CALL gettimeofday(0x807ad48,0) 977 amd 1229033597.273874 RET gettimeofday 0 1712 amd 1229033597.273878 CALL unmount(0x2832c610,invalid0) ... 1712 amd 1229033597.352643 RET unmount -1 errno 16 Device busy 1712 amd 1229033597.352695 CALL sigprocmask(SIG_BLOCK,0x28097c00,0xbfbfea0c) 1712 amd 1229033597.352728 RET sigprocmask 0 1712 amd 1229033597.352751 CALL sigprocmask(SIG_SETMASK,0x28097c10,0) 1712 amd 1229033597.352769 RET sigprocmask 0 1712 amd 1229033597.352781 CALL sigprocmask(SIG_BLOCK,0x28097c00,0xbfbfe9dc) 1712 amd 1229033597.352790 RET sigprocmask 0 1712 amd 1229033597.352801 CALL sigprocmask(SIG_SETMASK,0x28097c10,0) 1712 amd 1229033597.352805 RET sigprocmask 0 1712 amd 1229033597.352815 CALL exit(0x10) 977 amd 1229033597.353085 RET select -1 errno 4 Interrupted system call 977 amd 1229033597.353093 PSIG SIGCHLD caught handler=0x805de50 mask=0x0 code=0x0 977 amd 1229033597.353103 CALL wait4(0x,0xbfbfe83c,WNOHANG,0) 977 amd 1229033597.353116 RET wait4 1712/0x6b0 977 amd 1229033597.353122 CALL
Re: NFS ( amd?) dysfunction descending a hierarchy
On Tue, Dec 09, 2008 at 02:20:05PM -0800, Julian Elischer wrote: Kostik Belousov wrote: On Tue, Dec 09, 2008 at 11:01:10AM -0800, David Wolfskill wrote: On Tue, Dec 02, 2008 at 04:15:38PM -0800, David Wolfskill wrote: I seem to have a fairly- (though not deterministly so) reproducible mode of failure with an NFS-mounted directory hierarchy: An attempt to traverse a sufficiently large hierarchy (e.g., via tar zcpf or rm -fr) will fail to visit some subdirectories, typically apparently acting as if the subdirectories in question do not actually exist (despite the names having been returned in the output of a previous readdir()). ... Did you saw me previous answer ? Supposed patch for your problem was committed to head as r185557, and MFCed to 7 in r185796, and to 7.1 in r185801. Please test with latest sources. did you notice that he tested with latest -current and releng 7? Yes, and failure mode on the HEAD looks like a different issue. pgpr8TYNabIWV.pgp Description: PGP signature
Re: NFS ( amd?) dysfunction descending a hierarchy
On Wed, Dec 10, 2008 at 11:30:26AM -0500, Rick Macklem wrote: ... The different behaviour for -CURRENT could be the newer RPC layer that was recently introduced, but that doesn't explain the basic problem. OK. All I can think of is to ask the obvious question. Are you using interruptible or soft mounts? If so, switch to hard mounts and see if the problem goes away. (imho, neither interruptible nor soft mounts are a good idea. You can use a forced dismount if there is a crashed NFS server that isn't coming back anytime soon.) From examination of /etc/amd* -- I don't see how to get mount(8) or amq(8) to report it -- it appears that we are using interruptible mounts, as we always have. The point is that the behavior has changed in an unexpected way. And I'm not so sure that the use of a forced dismount is generally available, as it would require logging in to the NFS client first, which may be difficult if the NFS server hosting non-root home directories is failing to respond and direct root login via ssh(1) is not permitted (as is the default). If you are getting this with hard mounts, I'm afraid I have no idea what the problem is, rick. What concerns me is that even if the attempted unmount gets EBUSY, the user-level process descending the directory hierarchy is getting ENOENT trying to issue fstatfs() against an open file descriptor. I'm having trouble figuring out any way that makes any sense. Peace, david -- David H. Wolfskill [EMAIL PROTECTED] Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. pgpiNs29PfKCN.pgp Description: PGP signature
Re: NFS ( amd?) dysfunction descending a hierarchy
On Wed, Dec 10, 2008 at 08:50:22AM -0800, David Wolfskill wrote: On Wed, Dec 10, 2008 at 11:30:26AM -0500, Rick Macklem wrote: ... The different behaviour for -CURRENT could be the newer RPC layer that was recently introduced, but that doesn't explain the basic problem. OK. All I can think of is to ask the obvious question. Are you using interruptible or soft mounts? If so, switch to hard mounts and see if the problem goes away. (imho, neither interruptible nor soft mounts are a good idea. You can use a forced dismount if there is a crashed NFS server that isn't coming back anytime soon.) From examination of /etc/amd* -- I don't see how to get mount(8) or amq(8) to report it -- it appears that we are using interruptible mounts, as we always have. The point is that the behavior has changed in an unexpected way. And I'm not so sure that the use of a forced dismount is generally available, as it would require logging in to the NFS client first, which may be difficult if the NFS server hosting non-root home directories is failing to respond and direct root login via ssh(1) is not permitted (as is the default). If you are getting this with hard mounts, I'm afraid I have no idea what the problem is, rick. What concerns me is that even if the attempted unmount gets EBUSY, the user-level process descending the directory hierarchy is getting ENOENT trying to issue fstatfs() against an open file descriptor. I'm having trouble figuring out any way that makes any sense. Basically, the problem is that NFS uses shared lookup, and this allows for the bug where several negative namecache entries are created for non-existent node. Then this node gets created, removing only the first negative namecache entry. For some reasons, vnode is reclaimed; amd' tasting of unmount is a good reason for vnode to be reclaimed. Now, you have existing path and a negative cache entry. This was reported by Peter Holm first, I listed relevant revisions that should fix this in previous mail. pgpJTtEegr73d.pgp Description: PGP signature
Re: NFS ( amd?) dysfunction descending a hierarchy
On Tue, 9 Dec 2008, David Wolfskill wrote: On Tue, Dec 02, 2008 at 04:15:38PM -0800, David Wolfskill wrote: I seem to have a fairly- (though not deterministly so) reproducible mode of failure with an NFS-mounted directory hierarchy: An attempt to traverse a sufficiently large hierarchy (e.g., via tar zcpf or rm -fr) will fail to visit some subdirectories, typically apparently acting as if the subdirectories in question do not actually exist (despite the names having been returned in the output of a previous readdir()). ... I was able to reproduce the external symptoms of the failure running CURRENT as of yesterday, using rm -fr of a copy of a recent /usr/ports hierachy on an NFS-mounted file system as a test case. However, I believe the mechanism may be a bit different -- while still being other than what I would expect. One aspect in which the externally-observable symptoms were different (under CURRENT, vs. RELENG_7) is that under CURRENT, once the error condition occurred, the NFS client machine was in a state where it merely kept repeating nfs server [EMAIL PROTECTED]:/volume: not responding until I logged in as root rebooted it. The different behaviour for -CURRENT could be the newer RPC layer that was recently introduced, but that doesn't explain the basic problem. All I can think of is to ask the obvious question. Are you using interruptible or soft mounts? If so, switch to hard mounts and see if the problem goes away. (imho, neither interruptible nor soft mounts are a good idea. You can use a forced dismount if there is a crashed NFS server that isn't coming back anytime soon.) If you are getting this with hard mounts, I'm afraid I have no idea what the problem is, rick. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS ( amd?) dysfunction descending a hierarchy
On Tue, Dec 02, 2008 at 04:15:38PM -0800, David Wolfskill wrote: I seem to have a fairly- (though not deterministly so) reproducible mode of failure with an NFS-mounted directory hierarchy: An attempt to traverse a sufficiently large hierarchy (e.g., via tar zcpf or rm -fr) will fail to visit some subdirectories, typically apparently acting as if the subdirectories in question do not actually exist (despite the names having been returned in the output of a previous readdir()). ... I was able to reproduce the external symptoms of the failure running CURRENT as of yesterday, using rm -fr of a copy of a recent /usr/ports hierachy on an NFS-mounted file system as a test case. However, I believe the mechanism may be a bit different -- while still being other than what I would expect. One aspect in which the externally-observable symptoms were different (under CURRENT, vs. RELENG_7) is that under CURRENT, once the error condition occurred, the NFS client machine was in a state where it merely kept repeating nfs server [EMAIL PROTECTED]:/volume: not responding until I logged in as root rebooted it. Here's a cut/paste of the kdump from the ktrace of the amd(8) process under CURRENT, showing where the master amd(8) process (pid 848) forks a child (4126) to try the unmount: 848 amd 1228846258.722953 CALL gettimeofday(0x8078e48,0) 848 amd 1228846258.722964 RET gettimeofday 0 848 amd 1228846258.722982 CALL sigprocmask(SIG_BLOCK,0xbfbfeaec,0xbfbfeadc) 848 amd 1228846258.722993 RET sigprocmask 0 848 amd 1228846258.723003 CALL fork 848 amd 1228846258.730250 RET fork 4126/0x101e 848 amd 1228846258.730405 CALL sigprocmask(SIG_SETMASK,0xbfbfeadc,0) 4126 amd 1228846258.730252 RET fork 0 4126 amd 1228846258.730456 CALL getpid 4126 amd 1228846258.730467 RET getpid 4126/0x101e 4126 amd 1228846258.730493 CALL unmount(0x2825f340,invalid0) 848 amd 1228846258.730422 RET sigprocmask 0 848 amd 1228846258.730595 CALL gettimeofday(0x8078e48,0) 848 amd 1228846258.730608 RET gettimeofday 0 ... 848 amd 1228846258.914814 CALL sigprocmask(SIG_SETMASK,0xbfbfeba0,0) 848 amd 1228846258.914826 RET sigprocmask 0 848 amd 1228846258.914838 CALL select(0x400,0xbfbfec40,0,0,0xbfbfecd8) 4126 amd 1228846259.090428 RET unmount 0 4126 amd 1228846259.090492 CALL sigprocmask(SIG_BLOCK,0x2809b080,0xbfbfea0c) 4126 amd 1228846259.090505 RET sigprocmask 0 4126 amd 1228846259.090518 CALL sigprocmask(SIG_SETMASK,0x2809b090,0) 4126 amd 1228846259.090530 RET sigprocmask 0 4126 amd 1228846259.090545 CALL sigprocmask(SIG_BLOCK,0x2809b080,0xbfbfe9dc) 4126 amd 1228846259.090556 RET sigprocmask 0 4126 amd 1228846259.090576 CALL sigprocmask(SIG_SETMASK,0x2809b090,0) 4126 amd 1228846259.090587 RET sigprocmask 0 4126 amd 1228846259.090605 CALL exit(0) 848 amd 1228846259.091248 RET select -1 errno 4 Interrupted system call 848 amd 1228846259.091277 PSIG SIGCHLD caught handler=0x805e090 mask=0x0 code=0x0 848 amd 1228846259.091298 CALL wait4(0x,0xbfbfe83c,WNOHANG,0) 848 amd 1228846259.091329 RET wait4 4126/0x101e 848 amd 1228846259.091342 CALL wait4(0x,0xbfbfe83c,WNOHANG,0) 848 amd 1228846259.091352 RET wait4 -1 errno 10 No child processes 848 amd 1228846259.091365 CALL sigprocmask(SIG_SETMASK,0x80795bc,0) 848 amd 1228846259.091377 RET sigprocmask 0 848 amd 1228846259.091390 CALL sigprocmask(SIG_BLOCK,0x80792c4,0) 848 amd 1228846259.091401 RET sigprocmask 0 848 amd 1228846259.091411 CALL gettimeofday(0x8078e48,0) 848 amd 1228846259.091422 RET gettimeofday 0 Note that while the child didn't get EBUSY (as it does under RELENG_7) -- indeed, the unmount call appears to have returned 0 -- the master amd(8) process looks to be seeing errno 4 Interrupted system call. And here's a relevent part of the kdump from the rm -fr -- I had kdump spit out Epoch timestamps with each in order to make correlation easier: 4121 rm 1228846258.736266 CALL unlink(0x2821c148) 4121 rm 1228846258.736281 NAMI distinfo 4121 rm 1228846258.738329 RET unlink 0 4121 rm 1228846258.738379 CALL unlink(0x2821c1b8) 4121 rm 1228846258.738401 NAMI pkg-descr 4121 rm 1228846258.739963 RET unlink 0 4121 rm 1228846258.739982 CALL open(0x28178b6b,O_RDONLY,unused0) 4121 rm 1228846258.740002 NAMI .. 4121 rm 1228846258.740541 RET open 4 4121 rm 1228846258.740558 CALL fstat(0x4,0xbfbfe96c) 4121 rm 1228846258.740579 STRU struct stat {dev=67174155, ino=22674937, mode=drwxr-xr-x , nlink=114, uid=9874, gid=929, rdev=0, atime=1228846258.184514000, stime =1228846258.779501000, ctime=1228846258.779501000, birthtime=-1,
Re: NFS ( amd?) dysfunction descending a hierarchy
On Tue, Dec 09, 2008 at 11:01:10AM -0800, David Wolfskill wrote: On Tue, Dec 02, 2008 at 04:15:38PM -0800, David Wolfskill wrote: I seem to have a fairly- (though not deterministly so) reproducible mode of failure with an NFS-mounted directory hierarchy: An attempt to traverse a sufficiently large hierarchy (e.g., via tar zcpf or rm -fr) will fail to visit some subdirectories, typically apparently acting as if the subdirectories in question do not actually exist (despite the names having been returned in the output of a previous readdir()). ... Did you saw me previous answer ? Supposed patch for your problem was committed to head as r185557, and MFCed to 7 in r185796, and to 7.1 in r185801. Please test with latest sources. pgpc9EK8FjhlR.pgp Description: PGP signature
Re: NFS ( amd?) dysfunction descending a hierarchy
Kostik Belousov wrote: On Tue, Dec 09, 2008 at 11:01:10AM -0800, David Wolfskill wrote: On Tue, Dec 02, 2008 at 04:15:38PM -0800, David Wolfskill wrote: I seem to have a fairly- (though not deterministly so) reproducible mode of failure with an NFS-mounted directory hierarchy: An attempt to traverse a sufficiently large hierarchy (e.g., via tar zcpf or rm -fr) will fail to visit some subdirectories, typically apparently acting as if the subdirectories in question do not actually exist (despite the names having been returned in the output of a previous readdir()). ... Did you saw me previous answer ? Supposed patch for your problem was committed to head as r185557, and MFCed to 7 in r185796, and to 7.1 in r185801. Please test with latest sources. did you notice that he tested with latest -current and releng 7? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS ( amd?) dysfunction descending a hierarchy
On Tue, Dec 09, 2008 at 02:20:05PM -0800, Julian Elischer wrote: Kostik Belousov wrote: ... Did you saw me previous answer ? Supposed patch for your problem was committed to head as r185557, and MFCed to 7 in r185796, and to 7.1 in r185801. Please test with latest sources. did you notice that he tested with latest -current and releng 7? CURRENT was as of yesterday, as was RELENG_7; while kib@'s commit hit HEAD on 02 Dec, but didn't hit RELENG_7 until after I grabbed the sources for RELENG_7 yesterday. I have some local infrastructure hassles to deal with so I can update the sources in question, but I will test RELENG_7 with the commit report back. Peace, david -- David H. Wolfskill [EMAIL PROTECTED] Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. pgpn5zT16LyT9.pgp Description: PGP signature
Re: NFS ( amd?) dysfunction descending a hierarchy
--hYooF8G/hrfVAmum Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I seem to have a fairly- (though not deterministly so) reproducible mode of failure with an NFS-mounted directory hierarchy: An attempt to traverse a sufficiently large hierarchy (e.g., via tar zcpf or rm -fr) will fail to visit some subdirectories, typically apparently acting as if the subdirectories in question do not actually exist (despite the names having been returned in the output of a previous readdir()). The file system is mounted read-write, courtesy of amd(8); none of the files has any non-default flags; there are no ACLs involved; and I owned the lot (that is, as owning user of the files). An example of sufficiently large has been demonstrated to be a recent copy of a FreeBSD ports tree. (The problem was discovered using a hierarchy that had some proprietary content; I tried a copy of the ports tree to see if I could replicate the issue with something a FreeBSD hacker would more likely have handy. And avoid NDA issues. :-}) Now, before I go further: I'm not pointing the finger at FreeBSD, here (yet). At minimum, there could be fault with FreeBSD (as the NFS client); with amd(8); with the NetApp Filer (as the NFS server); or the network -- or the configuration(s) of any of them. But I just tried this, using the same NFS server, but a machine running Solaris 8 as an NFS client, and was unable to re-create the problem. And I found a way to avoid having the problem occur using a FreeBSD NFS client: whack amd(8)'s config so that the dismount_interval is 12 hours instead of the default 2 minutes, thus effectivly preventing amd(8) from its normal attempts to unmount file systems. Please note that I don't consider this a fix -- or even an acceptable circumvention, in the long term. Rather, it's a diagnostic change, in an attempt to better understand the nature of the problem. Here are step-by-step instructions to recreate the problem; unfortunately, I believe I don't have the resources to test this anywhere but at work, though I will try it at home, to the extent that I can: * Set up the environment. * The failing environment uses NetApp filers as NFS servers. I don't know what kind or how recent the software is on them, but can find out. (I exepct they're fairly well-maintained.) * Ensure that the NFS space available is at least 10 GB or more. I will refer to this as ~/NFS/, as I tend to create such symlinks to keep track of things. * I used a dual, quad-core machine running FreeBSD RELENG_7_1 as of yesterday morning as an NFS client. It also had a recently-updated /usr/ports tree, which was a CVS working directory (so each real subdirectory also had a CVS subdirectory within it). * Set up amd(8) so that ~/NFS is mounted on demand when it's referenced, and only via amd(8). Ensure that the dismount_interval has the default value of 120 seconds. * Create a reference tarball. * cd /usr tar zcpf ~/NFS/ports.tgz ports/ * Create the test directory hierarchy. * cd ~/NFS tar zxpf ports.tgz * Clear any cache. * Unmount ~/NFS, then re-mount it. Or just reboot the NFS client machine. Or arrange to have done all of the above set-up stuff from a differnet NFS client. * Set up for information capture (optional). * Use ps(1) or your favorite alternative tool to determine the PID for amd(8). Note that `cat /var/run/amd.pid` won't do the trick. :-{ * Run ktrace(1) to capture activity from amd(8) and its descendants, e.g.: sudo ktrace -dip ${amd_pid} -f ktrace_amd.out * Start a packet-capture for NFS traffic, e.g.: sudo tcpdump -s 0 -n -w nfs.bpf host ${nfs_server} * Start the test. * Do this under ktrace(1), if you did the above optional step: rm -fr ~/NFS/ports; echo $? As soon as rm(1) issues a whine, you might as well interrupt it (^C). * Stop the information capture, if you started it. * ^C for the tcpdump(1) process. * sudo ktrace -C If the packet capture file is too big for the analysis program you prefer to digest as a unit, see the net/tcpslice port for a bit of relief. (Wireshark seems to want to read an entire packet capture file into main memory.) I have performed the above, with the information-gathering step; I can *probably* make that information available, but I'll need to check -- some organizations get paranoid about things like host names. I don't expect that my current employer is, but I don't know yet, so I won't promise. In the mean time, I should be able to extract somewhat-relevant information from what I've collected, if that would be useful. While I wouldn't mind sharing the results, I strongly suspect that blow-by-blow analysis wouldn't be ideal for this (or any other) mailing list; I would be very happy to work
Re: NFS ( amd?) dysfunction descending a hierarchy
On Wed, Dec 03, 2008 at 02:20:32PM +0200, Danny Braniss wrote: ... i'll try to check it here soon, but in the meantime, could you try the same but mounting directly, not via amd, to remove one item from the equation? (I don't know how much amd is involved here, but if you are running on a 64bit host, amd could be swapped out, in which case it tends to realy screw things up, which is not your case, but ...) Sorry; I should have mentioned that the NFS client was running RELENG_7_1 as of Monday morning, i386 arch. The amd.conf file specifies plock for amd(8). Note that merely telling amd(8) to kick the interval of attempted unmounts from 2 minutes to 12 hours appears to avoid the observed symptoms, so I'm fairly confident that bypassing amd(8) altogether would do so as well. In looking at the output from ktrace against amd(8), I recall having seen that shortly before an observed failure, the (master) amd process forks a child to attempt the unmount; the child issues an unmount, the return for which is EBUSY (IIRC -- I'm not in a good position to check just at the moment), so the child terminates with an interrupted system call. I'd have thought that since the attempted unmount failed, it wouldn't make any difference, but it's right around that point that rm(1) is told that a directory entry it found earlier doesn't exist, which rather snowballs into the previously-described symptoms. Peace, david -- David H. Wolfskill [EMAIL PROTECTED] Depriving a girl or boy of an opportunity for education is evil. See http://www.catwhisker.org/~david/publickey.gpg for my public key. pgpp431JKqC0x.pgp Description: PGP signature
Re: NFS ( amd?) dysfunction descending a hierarchy
--vmttodhTwj0NAgWp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 03, 2008 at 02:20:32PM +0200, Danny Braniss wrote: ... i'll try to check it here soon, but in the meantime, could you try the sa= me but mounting directly, not via amd, to remove one item from the equation? (I don't know how much amd is involved here, but if you are running on a 64bit host, amd could be swapped out, in which case it tends to realy scr= ew things up, which is not your case, but ...) Sorry; I should have mentioned that the NFS client was running RELENG_7_1 as of Monday morning, i386 arch. The amd.conf file specifies plock for amd(8). Note that merely telling amd(8) to kick the interval of attempted unmounts from 2 minutes to 12 hours appears to avoid the observed symptoms, so I'm fairly confident that bypassing amd(8) altogether would do so as well. In looking at the output from ktrace against amd(8), I recall having seen that shortly before an observed failure, the (master) amd process forks a child to attempt the unmount; the child issues an unmount, the return for which is EBUSY (IIRC -- I'm not in a good position to check just at the moment), so the child terminates with an interrupted system call. I'd have thought that since the attempted unmount failed, it wouldn't make any difference, but it's right around that point that rm(1) is told that a directory entry it found earlier doesn't exist, which rather snowballs into the previously-described symptoms. so it does point to amd - or something inocent it does - which triggers the error. btw, there are some patches (5 I think), that try to fix some of amd problems. I've installed them, and things are quiet/ok -most of the time- but I get a glitch once in a while. would love to iron them out though. cheers, danny ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: NFS ( amd?) dysfunction descending a hierarchy
On Tue, Dec 02, 2008 at 04:15:38PM -0800, David Wolfskill wrote: I seem to have a fairly- (though not deterministly so) reproducible mode of failure with an NFS-mounted directory hierarchy: An attempt to traverse a sufficiently large hierarchy (e.g., via tar zcpf or rm -fr) will fail to visit some subdirectories, typically apparently acting as if the subdirectories in question do not actually exist (despite the names having been returned in the output of a previous readdir()). The file system is mounted read-write, courtesy of amd(8); none of the files has any non-default flags; there are no ACLs involved; and I owned the lot (that is, as owning user of the files). An example of sufficiently large has been demonstrated to be a recent copy of a FreeBSD ports tree. (The problem was discovered using a hierarchy that had some proprietary content; I tried a copy of the ports tree to see if I could replicate the issue with something a FreeBSD hacker would more likely have handy. And avoid NDA issues. :-}) Now, before I go further: I'm not pointing the finger at FreeBSD, here (yet). At minimum, there could be fault with FreeBSD (as the NFS client); with amd(8); with the NetApp Filer (as the NFS server); or the network -- or the configuration(s) of any of them. But I just tried this, using the same NFS server, but a machine running Solaris 8 as an NFS client, and was unable to re-create the problem. And I found a way to avoid having the problem occur using a FreeBSD NFS client: whack amd(8)'s config so that the dismount_interval is 12 hours instead of the default 2 minutes, thus effectivly preventing amd(8) from its normal attempts to unmount file systems. Please note that I don't consider this a fix -- or even an acceptable circumvention, in the long term. Rather, it's a diagnostic change, in an attempt to better understand the nature of the problem. Here are step-by-step instructions to recreate the problem; unfortunately, I believe I don't have the resources to test this anywhere but at work, though I will try it at home, to the extent that I can: * Set up the environment. * The failing environment uses NetApp filers as NFS servers. I don't know what kind or how recent the software is on them, but can find out. (I exepct they're fairly well-maintained.) * Ensure that the NFS space available is at least 10 GB or more. I will refer to this as ~/NFS/, as I tend to create such symlinks to keep track of things. * I used a dual, quad-core machine running FreeBSD RELENG_7_1 as of yesterday morning as an NFS client. It also had a recently-updated /usr/ports tree, which was a CVS working directory (so each real subdirectory also had a CVS subdirectory within it). * Set up amd(8) so that ~/NFS is mounted on demand when it's referenced, and only via amd(8). Ensure that the dismount_interval has the default value of 120 seconds. * Create a reference tarball. * cd /usr tar zcpf ~/NFS/ports.tgz ports/ * Create the test directory hierarchy. * cd ~/NFS tar zxpf ports.tgz * Clear any cache. * Unmount ~/NFS, then re-mount it. Or just reboot the NFS client machine. Or arrange to have done all of the above set-up stuff from a differnet NFS client. * Set up for information capture (optional). * Use ps(1) or your favorite alternative tool to determine the PID for amd(8). Note that `cat /var/run/amd.pid` won't do the trick. :-{ * Run ktrace(1) to capture activity from amd(8) and its descendants, e.g.: sudo ktrace -dip ${amd_pid} -f ktrace_amd.out * Start a packet-capture for NFS traffic, e.g.: sudo tcpdump -s 0 -n -w nfs.bpf host ${nfs_server} * Start the test. * Do this under ktrace(1), if you did the above optional step: rm -fr ~/NFS/ports; echo $? As soon as rm(1) issues a whine, you might as well interrupt it (^C). * Stop the information capture, if you started it. * ^C for the tcpdump(1) process. * sudo ktrace -C If the packet capture file is too big for the analysis program you prefer to digest as a unit, see the net/tcpslice port for a bit of relief. (Wireshark seems to want to read an entire packet capture file into main memory.) I have performed the above, with the information-gathering step; I can *probably* make that information available, but I'll need to check -- some organizations get paranoid about things like host names. I don't expect that my current employer is, but I don't know yet, so I won't promise. In the mean time, I should be able to extract somewhat-relevant information from what I've collected, if that would be useful. While I wouldn't mind sharing the results, I strongly suspect that blow-by-blow analysis wouldn't be ideal for this (or any other) mailing list; I would be very happy to work with others to figure out what's gone wrong (or is misconfigured) and get
NFS ( amd?) dysfunction descending a hierarchy
I seem to have a fairly- (though not deterministly so) reproducible mode of failure with an NFS-mounted directory hierarchy: An attempt to traverse a sufficiently large hierarchy (e.g., via tar zcpf or rm -fr) will fail to visit some subdirectories, typically apparently acting as if the subdirectories in question do not actually exist (despite the names having been returned in the output of a previous readdir()). The file system is mounted read-write, courtesy of amd(8); none of the files has any non-default flags; there are no ACLs involved; and I owned the lot (that is, as owning user of the files). An example of sufficiently large has been demonstrated to be a recent copy of a FreeBSD ports tree. (The problem was discovered using a hierarchy that had some proprietary content; I tried a copy of the ports tree to see if I could replicate the issue with something a FreeBSD hacker would more likely have handy. And avoid NDA issues. :-}) Now, before I go further: I'm not pointing the finger at FreeBSD, here (yet). At minimum, there could be fault with FreeBSD (as the NFS client); with amd(8); with the NetApp Filer (as the NFS server); or the network -- or the configuration(s) of any of them. But I just tried this, using the same NFS server, but a machine running Solaris 8 as an NFS client, and was unable to re-create the problem. And I found a way to avoid having the problem occur using a FreeBSD NFS client: whack amd(8)'s config so that the dismount_interval is 12 hours instead of the default 2 minutes, thus effectivly preventing amd(8) from its normal attempts to unmount file systems. Please note that I don't consider this a fix -- or even an acceptable circumvention, in the long term. Rather, it's a diagnostic change, in an attempt to better understand the nature of the problem. Here are step-by-step instructions to recreate the problem; unfortunately, I believe I don't have the resources to test this anywhere but at work, though I will try it at home, to the extent that I can: * Set up the environment. * The failing environment uses NetApp filers as NFS servers. I don't know what kind or how recent the software is on them, but can find out. (I exepct they're fairly well-maintained.) * Ensure that the NFS space available is at least 10 GB or more. I will refer to this as ~/NFS/, as I tend to create such symlinks to keep track of things. * I used a dual, quad-core machine running FreeBSD RELENG_7_1 as of yesterday morning as an NFS client. It also had a recently-updated /usr/ports tree, which was a CVS working directory (so each real subdirectory also had a CVS subdirectory within it). * Set up amd(8) so that ~/NFS is mounted on demand when it's referenced, and only via amd(8). Ensure that the dismount_interval has the default value of 120 seconds. * Create a reference tarball. * cd /usr tar zcpf ~/NFS/ports.tgz ports/ * Create the test directory hierarchy. * cd ~/NFS tar zxpf ports.tgz * Clear any cache. * Unmount ~/NFS, then re-mount it. Or just reboot the NFS client machine. Or arrange to have done all of the above set-up stuff from a differnet NFS client. * Set up for information capture (optional). * Use ps(1) or your favorite alternative tool to determine the PID for amd(8). Note that `cat /var/run/amd.pid` won't do the trick. :-{ * Run ktrace(1) to capture activity from amd(8) and its descendants, e.g.: sudo ktrace -dip ${amd_pid} -f ktrace_amd.out * Start a packet-capture for NFS traffic, e.g.: sudo tcpdump -s 0 -n -w nfs.bpf host ${nfs_server} * Start the test. * Do this under ktrace(1), if you did the above optional step: rm -fr ~/NFS/ports; echo $? As soon as rm(1) issues a whine, you might as well interrupt it (^C). * Stop the information capture, if you started it. * ^C for the tcpdump(1) process. * sudo ktrace -C If the packet capture file is too big for the analysis program you prefer to digest as a unit, see the net/tcpslice port for a bit of relief. (Wireshark seems to want to read an entire packet capture file into main memory.) I have performed the above, with the information-gathering step; I can *probably* make that information available, but I'll need to check -- some organizations get paranoid about things like host names. I don't expect that my current employer is, but I don't know yet, so I won't promise. In the mean time, I should be able to extract somewhat-relevant information from what I've collected, if that would be useful. While I wouldn't mind sharing the results, I strongly suspect that blow-by-blow analysis wouldn't be ideal for this (or any other) mailing list; I would be very happy to work with others to figure out what's gone wrong (or is misconfigured) and get things working properly. If someone(s) would be willing to help, I'd appreciate it very much. If (enough) folks would actually prefer that the details stay in