On Tue, 9 Dec 2008, David Wolfskill wrote:

On Tue, Dec 02, 2008 at 04:15:38PM -0800, David Wolfskill wrote:
I seem to have a fairly- (though not deterministly so) reproducible
mode of failure with an NFS-mounted directory hierarchy:  An attempt to
traverse a "sufficiently large" hierarchy (e.g., via "tar zcpf" or "rm
-fr") will fail to "visit" some subdirectories, typically apparently
acting as if the subdirectories in question do not actually exist
(despite the names having been returned in the output of a previous
readdir()).
...

I was able to reproduce the external symptoms of the failure running
CURRENT as of yesterday, using "rm -fr" of a copy of a recent
/usr/ports hierachy on an NFS-mounted file system as a test case.
However, I believe the mechanism may be a bit different -- while
still being other than what I would expect.

One aspect in which the externally-observable symptoms were different
(under CURRENT, vs. RELENG_7) is that under CURRENT, once the error
condition occurred, the NFS client machine was in a state where it
merely kept repeating

        nfs server [EMAIL PROTECTED]:/volume: not responding

until I logged in as root & rebooted it.

The different behaviour for -CURRENT could be the newer RPC layer that
was recently introduced, but that doesn't explain the basic problem.

All I can think of is to ask the obvious question. "Are you using
interruptible or soft mounts?" If so, switch to hard mounts and see
if the problem goes away. (imho, neither interruptible nor soft mounts
are a good idea. You can use a forced dismount if there is a crashed
NFS server that isn't coming back anytime soon.)

If you are getting this with hard mounts, I'm afraid I have no idea
what the problem is, rick.

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to