Re: error building kernel: nfs_clvfsops.o: In function `nfs_mount':, nfs_clvfsops.c:(.text+0x1638): undefined reference to `nfs_diskless_valid'
Since today's source (FreeBSD 9.0-CURRENT/amd64 (source is: Revision: 221060) update I get the follwoing error while building the kernel (options NFSD/options NFSCL instead of options NFSSERVER/options NFSCLIENT): cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=native -std=c99 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -W issing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/usr/src/sys -I/usr/src/s s/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --par m large-function-growth=1000 -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-ss 3 -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -Werror vers.c linking kernel nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1638): undefined reference to `nfs_diskless_valid' nfs_clvfsops.c:(.text+0x1652): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1658): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1689): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x16d1): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1712): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x171b): more undefined references to `nfsv3_diskless' follow nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1e19): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e2a): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e31): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e3d): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e44): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e4a): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e50): undefined reference to `nfs_diskless' nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1e57): more undefined references to `nfs_diskless' follow nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1e65): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e6b): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e73): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e79): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e80): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e87): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e8e): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e94): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e9a): undefined reference to `nfs_diskless' nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1ea0): more undefined references to `nfs_diskless' follow nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1eb3): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1ebd): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1ec4): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1ecb): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1ed2): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1ed9): more undefined references to `nfsv3_diskless' follow nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1f18): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f1e): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f33): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f3a): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f4b): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f52): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f5e): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f6a): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f71): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f78): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f83): undefined reference to `nfs_diskless_valid' nfs_clvfsops.c:(.text+0x1fcc): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1fd3): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1fd9): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x20ae): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:(.data+0x1f8): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:(.data+0x258): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:(.data+0x2b8): undefined reference to `nfs_diskless_valid' *** Error code 1 Oops, you'll have to add options NFS_ROOT to your kernel config until I commit a fix. Thanks for spotting it, rick ps: And a fresh config KERNEL followed by a build. I suspect you already did that.
Heads up: was Re: error building kernel: nfs_clvfsops.o: In function `nfs_mount':, nfs_clvfsops.c:(.text+0x1638): undefined reference to `nfs_diskless_valid'
Since today's source (FreeBSD 9.0-CURRENT/amd64 (source is: Revision: 221060) update I get the follwoing error while building the kernel (options NFSD/options NFSCL instead of options NFSSERVER/options NFSCLIENT): cc -c -O2 -frename-registers -pipe -fno-strict-aliasing -march=native -std=c99 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -W issing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/usr/src/sys -I/usr/src/s s/contrib/altq -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --par m large-function-growth=1000 -fno-omit-frame-pointer -mcmodel=kernel -mno-red-zone -mfpmath=387 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-ss 3 -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector -Werror vers.c linking kernel nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1638): undefined reference to `nfs_diskless_valid' nfs_clvfsops.c:(.text+0x1652): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1658): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1689): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x16d1): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1712): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x171b): more undefined references to `nfsv3_diskless' follow nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1e19): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e2a): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e31): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e3d): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e44): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e4a): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e50): undefined reference to `nfs_diskless' nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1e57): more undefined references to `nfs_diskless' follow nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1e65): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e6b): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e73): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e79): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1e80): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e87): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e8e): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e94): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1e9a): undefined reference to `nfs_diskless' nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1ea0): more undefined references to `nfs_diskless' follow nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1eb3): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1ebd): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1ec4): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1ecb): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1ed2): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:nfs_clvfsops.c:(.text+0x1ed9): more undefined references to `nfsv3_diskless' follow nfs_clvfsops.o: In function `nfs_mount': nfs_clvfsops.c:(.text+0x1f18): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f1e): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f33): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f3a): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f4b): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f52): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f5e): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f6a): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f71): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1f78): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1f83): undefined reference to `nfs_diskless_valid' nfs_clvfsops.c:(.text+0x1fcc): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x1fd3): undefined reference to `nfs_diskless' nfs_clvfsops.c:(.text+0x1fd9): undefined reference to `nfsv3_diskless' nfs_clvfsops.c:(.text+0x20ae): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:(.data+0x1f8): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:(.data+0x258): undefined reference to `nfsv3_diskless' nfs_clvfsops.o:(.data+0x2b8): undefined reference to `nfs_diskless_valid' *** Error code 1 Oops, you'll have to add options NFS_ROOT to your kernel config until I commit a fix.
Re: nfs error: No route to host when starting apache ...
On Fri, 1 Apr 2011, Rick Macklem wrote: Since rpc.lockd and rpc.statd expect to be able to do IP broadcast (same goes for rpcbind), I suspect that might be a problem w.r.t. jails, although I know nothing about how jails work? Oh, and you can use the nolock mount option to avoid use of rpc.lockd and rpc.statd. based on the mount_nfs man page, as well as trying it just in case, this option no longer appears to be availalble in the 7.x nfs code ... :( Oops, sorry. The option is called nolockd. rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: nfs error: No route to host when starting apache ...
I just setup an nfs mount between two servers ... ServerA, nfsd on 192.168.1.8 ServerB, nfs client on 192.168.1.7 I have a jail, ServerC, running on 192.168.1.7 ... most operations appear to work, but it looks like 'special files' of a sort aren't working, for when I try and startup Apache, I get: [Fri Apr 01 19:42:02 2011] [emerg] (65)No route to host: couldn't grab the accept mutex When I try and do a 'newaliases', I get: # newaliases postalias: fatal: lock /etc/aliases.db: No route to host Yet, for instance, both MySQL and PostgreSQL are running without any issues ... So, the mount is there, it is readable, it is working ... I can ssh into the jail, I can create files, etc ... I do have rpc.lockd and rpc.statd running on both client / server sides ... Since rpc.lockd and rpc.statd expect to be able to do IP broadcast (same goes for rpcbind), I suspect that might be a problem w.r.t. jails, although I know nothing about how jails work? I'm not seeing anything in eithr the man page for mount_nfs *or* nfsd that might account / corect for something like this, but since I'm not sure what this is exactly, not sure exactl what I should be looking for :( Note that this behaviour happens at the *physical* server level as well, having tested with using postalias to generate the same 'lock' issue above ... Now, I do have mountd/nfsd started iwth the -h to bind them to 192.168.1.8 ... *but*, the servers themselves, although on same switch do have different default gateways ... I'm not seeing anything within the man page for, say, rpc.statd/rpc.lockd that allows me to bind it to the 192.168.1.0/24 IP, so is it binding to my public IP instead of my private? So nfsd / mount_nfs can talk find, as they go thorugh 192.168.1.0/24 as desired, but rpc.statd/rpc.lockd are the public IPs and not able to talk to each other? Thx ... ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: nfs error: No route to host when starting apache ...
I just setup an nfs mount between two servers ... ServerA, nfsd on 192.168.1.8 ServerB, nfs client on 192.168.1.7 I have a jail, ServerC, running on 192.168.1.7 ... most operations appear to work, but it looks like 'special files' of a sort aren't working, for when I try and startup Apache, I get: [Fri Apr 01 19:42:02 2011] [emerg] (65)No route to host: couldn't grab the accept mutex When I try and do a 'newaliases', I get: # newaliases postalias: fatal: lock /etc/aliases.db: No route to host Yet, for instance, both MySQL and PostgreSQL are running without any issues ... So, the mount is there, it is readable, it is working ... I can ssh into the jail, I can create files, etc ... I do have rpc.lockd and rpc.statd running on both client / server sides ... Since rpc.lockd and rpc.statd expect to be able to do IP broadcast (same goes for rpcbind), I suspect that might be a problem w.r.t. jails, although I know nothing about how jails work? Oh, and you can use the nolock mount option to avoid use of rpc.lockd and rpc.statd. I'm not seeing anything in eithr the man page for mount_nfs *or* nfsd that might account / corect for something like this, but since I'm not sure what this is exactly, not sure exactl what I should be looking for :( Note that this behaviour happens at the *physical* server level as well, having tested with using postalias to generate the same 'lock' issue above ... Now, I do have mountd/nfsd started iwth the -h to bind them to 192.168.1.8 ... *but*, the servers themselves, although on same switch do have different default gateways ... I'm not seeing anything within the man page for, say, rpc.statd/rpc.lockd that allows me to bind it to the 192.168.1.0/24 IP, so is it binding to my public IP instead of my private? So nfsd / mount_nfs can talk find, as they go thorugh 192.168.1.0/24 as desired, but rpc.statd/rpc.lockd are the public IPs and not able to talk to each other? Thx ... ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: possible NFS lockups
From: Sam Fourman On Tue, Jul 27, 2010 at 10:29 AM, krad kra...@googlemail.com wrote: I have a production mail system with an nfs backend. Every now and again we see the nfs die on a particular head end. However it doesn't die across all the nodes. This suggests to me there isnt an issue with the filer itself and the stats from the filer concur with that. The symptoms are lines like this appearing in dmesg nfs server 10.44.17.138:/vol/vol1/mail: not responding nfs server 10.44.17.138:/vol/vol1/mail: is alive again trussing df it seems to hang on getfsstat, this is presumably when it tries the nfs mounts I also have this problem, where nfs locks up on a FreeBSD 9 server and a FreeBSD RELENG_8 client If by RELENG_8, you mean 8.0 (or pre-8.1), there are a number of patches for the client side krpc. They can be found at: http://people.freebsd.org/~rmacklem/freebsd8.0-patches (These are all in FreeBSD8.1, so ignore this if your client is already running FreeBSD8.1.) rick ps: lock up can mean many things. The more specific you can be w.r.t. the behaviour, the more likely it can be resolved. For example: - No more access to the subtree under the mount point is possible until the client is rebooted. When a ps axlH one process that was accessing a file in the mount point is shown with WCHAN rpclock and STAT DL. vs - All access to the mount point stops for about 1minute and then recovers. Also, showing what mount options are being used by the client and whether or not rpc.lockd and rpc.statd are running can also be useful. And if you can look at the net ttraffic with wireshark when it is locked up and see if any NFS traffic is happening can also be useful. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: possible NFS lockups
From: krad kra...@googlemail.com To: freebsd-hack...@freebsd.org, FreeBSD Questions freebsd-questions@freebsd.org Sent: Tuesday, July 27, 2010 11:29:20 AM Subject: possible NFS lockups I have a production mail system with an nfs backend. Every now and again we see the nfs die on a particular head end. However it doesn't die across all the nodes. This suggests to me there isnt an issue with the filer itself and the stats from the filer concur with that. The symptoms are lines like this appearing in dmesg nfs server 10.44.17.138:/vol/vol1/mail: not responding nfs server 10.44.17.138:/vol/vol1/mail: is alive again trussing df it seems to hang on getfsstat, this is presumably when it tries the nfs mounts eg __sysctl(0xbfbfe224,0x2,0xbfbfe22c,0xbfbfe230,0x0,0x0) = 0 (0x0) mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1746583552 (0x681ac000) mmap(0x682ac000,344064,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 1747632128 (0x682ac000) munmap(0x681ac000,344064) = 0 (0x0) getfsstat(0x68201000,0x1270,0x2,0xbfbfe960,0xbfbfe95c,0x1) = 9 (0x9) I have played with mount options a fair bit but they dont make much difference. This is what they are set to at present 10.44.17.138:/vol/vol1/mail /mail/0 nfs rw,noatime,tcp,acdirmax=320,acdirmin=180,acregmax=320,acregmin=180 0 0 When this locking is occuring I find that if I do a show mount or mount 10.44.17.138:/vol/vol1/mail again under another mount point I can access it fine. One thing I have just noticed is that lockd and statd always seem to have died when this happens. Restarting does not help lockd and statd implement separate protocols (NLM ans NSM) that do locking. The protocols were poorly designed and fundamentally broken imho. (That refers to the protocols and not the implementation.) I am not familiar with the lockd and statd implementations, but if you don't need file locking to work for the same file when accessed concurrently from multiple clients (heads) concurrently, you can use the nolockd mount option to avoid using them. (I have no idea if the mail system you are using will work without lockd or not? It should be ok to use nolockd if file locking is only done on a given file in one client node.) I suspect that some interaction between your server and the lockd/statd client causes them to crash and then the client is stuck trying to talk to them, but I don't really know? Looking at where all the processes and threads are sleeping via ps axlH may tell you what is stuck and where. As others noted, intermittent server not responding...server ok messages just indicate slow response from the server and don't mean much. However, if a given process is hung and doesn't recover, knowing what it is sleeping on can help w.r.t diagnosis. rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD NFS client goes into infinite retry loop
On Tue, 23 Mar 2010, John Baldwin wrote: Ah, I had read that patch as being a temporary testing hack. If you think that would be a good approach in general that would be ok with me. Well, it kinda was. I wasn't betting on it fixing the problem, but since it does... I think just mapping VFS_FHTOVP() errors to ESTALE is ok. Do you think I should ask pjd@ about it or just go ahead with a commit? Thanks for the help, rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD NFS client goes into infinite retry loop
On Mon, 22 Mar 2010, John Baldwin wrote: It looks like it also returns ESTALE when the inode is invalid ( ROOTINO || max inodes?) - would an unlinked file in FFS referenced at a later time report an invalid inode? I'm no ufs guy, but the only way I can think of is if the file system on the server was newfs'd with fewer i-nodes? (Unlikely, but...) (Basically, it is safe to return ESTALE for anything that is not a transient failure that could recover on a retry.) But back to your point, zfs_zget() seems to be failing and returning the EINVAL before zfs_fhtovp() even has a chance to set and check zp_gen. I'm trying to get some more details through the use of gratuitous dprintf()'s, but they don't seem to be making it to any logs or the console even with vfs.zfs.debug=1 set. Any pointers on how to get these dprintf() calls working? I know diddly (as in absolutely nothing about zfs). That I have no idea on. Maybe Rick can chime in? I'm actually not sure why we would want to treat a FHTOVP failure as anything but an ESTALE error in the NFS server to be honest. As far as I know, only if the underlying file system somehow has a situation where the file handle can't be translated at that point in time, but could be able to later. I have no idea if any file system is like that and I don't such a file system would be an appropriate choice for an NFS server, even if such a beast exists. (Even then, although FreeBSD's client assumes EIO might recover on a retry, that isn't specified in any RFC, as far as I know.) That's why I proposed a patch that simply translates all VFS_FHTOVP() errors to ESTALE in the NFS server. (It seems simpler than chasing down cases in all the underlying file systems?) rick, chiming in:-) ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD NFS client goes into infinite retry loop
On Fri, 19 Mar 2010, John Baldwin wrote: On Friday 19 March 2010 7:34:23 am Steve Polyack wrote: Hi, we use a FreeBSD 8-STABLE (from shortly after release) system as an NFS server to provide user home directories which get mounted across a few machines (all 6.3-RELEASE). For the past few weeks we have been running into problems where one particular client will go into an infinite loop where it is repeatedly trying to write data which causes the NFS server to return reply ok 40 write ERROR: Input/output error PRE: POST:. This retry loop can cause between 20mbps and 500mbps of I'm afraid I don't quite understand what you mean by causes the NFS server to return reply ok 40 write ERROR Is this something logged by syslog (I can't find a printf like this in the kernel sources) or is this something that tcpdump is giving you or ??? Why I ask is that it seems to say that the server is returning EIO (or maybe 40 == EMSGSIZE). The server should return ESTALE (NFSERR_STALE) after a file has been deleted. If it is returning EIO, then that will cause the client to keep trying to write the dirty block to the server. (EIO is interpreted by the client as a transient error.) [good stuff snipped] I have a feeling that using NFS in such a matter may simply be prone to such problems, but what confuses me is why the NFS client system is infinitely retrying the write operation and causing itself so much grief. Yes, your feeling is correct. This sort of race is inherent to NFS if you do not use some sort of locking protocol to resolve the race. The infinite retries sound like a client-side issue. Have you been able to try a newer OS version on a client to see if it still causes the same behavior? As John notes, having one client delete a file while another is trying to write it, is not a good thing. However, the server should return ESTALE after the file is deleted and that tells the client that the write can never succeed, so it marks the buffer cache block invalid and returns the error to the app. (The app. may not see it, if it doesn't check for error returns upon close as well as write, but that's another story...) If you could look at a packet trace via wireshark when the problem occurs, it would be nice to see what the server is returning. (If it isn't ESTALE and the file no longer exists on the server, then thats a server problem.) If it is returning ESTALE, then the client is busted. (At a glance, the client code looks like it would handle ESTALE as a fatal error for the buffer cache, but that doesn't mean it isn't broken, just that it doesn't appear wrong. Also, it looks like mmap'd writes won't recognize a fatal write error and will just keep trying to write the dirty page back to the server. Take this with a big grain of salt, since I just took a quick look at the sources. FreeBSD6-8 appear to be pretty much the same as far as this goes, in the client. Please let us know if you can see the server's error reply code. Good luck with it, rick ps: If the server isn't returning ESTALE, you could try switching to the experimental nfs server and see if it exhibits the same behaviour? (-e option on both mountd and nfsd, assuming the server is FreeBSD8.) ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD NFS client goes into infinite retry loop
On Fri, 19 Mar 2010, Steve Polyack wrote: To anyone who is interested: I did some poking around with DTrace, which led me to the nfsiod client code. In src/sys/nfsclient/nfs_nfsiod.c: } else { if (bp-b_iocmd == BIO_READ) (void) nfs_doio(bp-b_vp, bp, bp-b_rcred, NULL); else (void) nfs_doio(bp-b_vp, bp, bp-b_wcred, NULL); } If you look t nfs_doio(), it decides whether or not to mark the buffer invalid, based on the return value it gets. Some (EINTR, ETIMEDOUT, EIO) are not considered fatal, but the others are. (When the async I/O daemons call nfs_doio(), they are threads that couldn't care less if the underlying I/O op succeeded. The outcome of the I/O operation determines what nfs_doio() does with the buffer cache block.) The result is that my problematic repeatable circumstance begins logging nfssvc_iod: iod 0 nfs_doio returned errno: 5 (corresponding to NFSERR_INVAL?) for each repetition of the failed write. The only things triggering this are my failed writes. I can also see the nfsiod0 process waking up each iteration. Nope, errno 5 is EIO and that's where the problem is. I don't know why the server is returning EIO after the file has been deleted on the server (I assume you did that when running your little shell script?). Do we need some kind of retry x times then abort logic within nfsiod_iod(), or does this belong in the subsequent functions, such as nfs_doio()? I think it's best to avoid these sorts of infinite loops which have the potential to take out the system or overload the network due to dumb decisions made by unprivileged users. Nope, people don't like data not getting written back to a server when it is slow or temporarily network partitioned. The only thing that should stop a client from retrying a write back to the server is a fatal error from the server that says this won't ever succeed. I think we need to figure out if the EIO (NFS3ERR_IO in wireshark) or if the server is sending NFS3ERR_STALE and the client is somehow munging that into EIO, causing the confusion. rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: FreeBSD NFS client goes into infinite retry loop
On Fri, 19 Mar 2010, Steve Polyack wrote: [good stuff snipped] This makes sense. According to wireshark, the server is indeed transmitting Status: NFS3ERR_IO (5). Perhaps this should be STALE instead; it sounds more correct than marking it a general IO error. Also, the NFS server is serving its share off of a ZFS filesystem, if it makes any difference. I suppose ZFS could be talking to the NFS server threads with some mismatched language, but I doubt it. Ok, now I think we're making progress. If VFS_FHTOVP() doesn't return ESTALE when the file no longer exists, the NFS server returns whatever error it has returned. So, either VFS_FHTOVP() succeeds after the file has been deleted, which would be a problem that needs to be fixed within ZFS OR ZFS returns an error other than ESTALE when it doesn't exist. Try the following patch on the server (which just makes any error returned by VFS_FHTOVP() into ESTALE) and see if that helps. --- nfsserver/nfs_srvsubs.c.sav 2010-03-19 22:06:43.0 -0400 +++ nfsserver/nfs_srvsubs.c 2010-03-19 22:07:22.0 -0400 @@ -1127,6 +1127,8 @@ } } error = VFS_FHTOVP(mp, fhp-fh_fid, vpp); + if (error != 0) + error = ESTALE; vfs_unbusy(mp); if (error) goto out; Please let me know if the patch helps, rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?
On Tue, 9 Feb 2010, O. Hartmann wrote: Well, I guess I havn't uderstood everything of NFSv4. The 'concept' of the 'root' is new to me, maybe there are some deeper explanation of the purpose? Are there supposed to be more than one 'root' enries or only one? Only to specify different security flavours for different client host IP#s. There is only one root location in the file system tree. This was done for NFSv4 to avoid any need for the mount protocol. See below. At this very moment mounting seems to work, but I always get a 'permission denied' error on every ZFS exported filesystem. Doing the same with UFS2 filesystems, everything works as expected. In NFSv4 mount does very little, since it does not use the mount protocol. It basically passes a pathname from the NFSv4 root into the kernel for later use. (Since UFS doesn't actually check exports, the experimental server checks them, but cheats and allows a minimal set of NFSv4 Operations on non-exported volumes, so that this pathname can be traversed to the exported volume. At this time ZFS checks exports. As such everything in the tree from the root specified by the V4: line must be exported for ZFS to work. I believe others have gotten a ZFS export to work, but I have no experience with it at this time. Is there a way to inspect the exports and mounts for the used NFS-protocol? Not that I am aware. (Excluding ZFS, which I don't know anything about, the /etc/exports file specifies the exports.) When issuing 'mount', the 'backup' mount is repoted to be 'newnfs', I assume this reflects NFSv4 being used, now I need to figure out what's going wrong with the ZFS export. NFS export of the ZFS filesystem is enabled, but as far as I know, this feature is not used in FreeBSD since ZFS in FreeBSD lacks of the capabilities of autonomously exporting its via NFS - well, I'm not an expert in this matter. I'm definitely not a ZFS expert either:-) I think the mount command is showing you that the mount point was created (newnfs refers to the experimental client), but as noted above, that doesn't indicate that it is accessible. (If you haven't tried moving the V4: /backup ... that moves the NFSv4 root to /backup, you should do that and see how it goes.) Good luck with it, rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?
On Mon, 8 Feb 2010, O. Hartmann wrote: Mounting the filessystem via mount_newnfs host:/path /path Oh, and you should set: sysctl vfs.newnfs.locallocks_enable=0 in the server, since I haven't fixed the local locking yet. (This implies that apps/daemons running locally on the server won't see byte range locks performed by NFSv4 clients.) However, byte range locking between NFSv4 clients should work ok. rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?
On Mon, 8 Feb 2010, O. Hartmann wrote: Mounting the filessystem via mount_newnfs host:/path /path works fine, but not mount -t nfs4 host:/path /path. The mount command can be either: mount -t nfs -o nfsv4 host:/path /path or mount -t newnfs -o nfsv4 host:/path /path (The above was what the old now removed nfs4 used.) Have fun with it, rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?
On Mon, 8 Feb 2010, O. Hartmann wrote: Oh, and you should set: sysctl vfs.newnfs.locallocks_enable=0 in the server, since I haven't fixed the local locking yet. (This implies that apps/daemons running locally on the server won't see byte range locks performed by NFSv4 clients.) However, byte range locking between NFSv4 clients should work ok. Interesting, I see a lot of vfs.newfs-stuff on server-side, but not this specific OID. Do I miss something here? Oops, make that vfs.newnfs.enable_locallocks=0 rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: NFSv4: mount -t nsf4 not the same as mount_newnfs?
On Mon, 8 Feb 2010, O. Hartmann wrote: So I guess the above one is the more 'transparent' one with respect to the future, when NFSv4 gets mature and its way as matured into the kernel? Yea, I'd only use mount -t newnfs if for some reason you want to test/use the experimental client for nfsv2,3 instead of the regular one. I tried the above and it works. But it seems, that only UFS2 filesystems can be mounted by the client. When trying mounting a filesystem residing on ZFS, it fails. Mounting works, but when try to access or doing a simple 'ls', I get ls: /backup: Permission denied On server side, /etc/exports looks like -- V4: / -sec=sys:krb5 #IPv4# /backup #IPv4# -- Is there still an issue with ZFS? For ZFS, everything from the root specified by the V4: line must be exported at this time. So, if / isn't exported, the above won't work for ZFS. You can either export / or move the NFSv4 root down to backup. For example, you could try: V4: /backup -sec=sys:krb5 /backup (assuming /backup is the ZFS volume) and then a mount like: mount -t nfs -o nfsv4 server:/ /mnt will mount /backup on /mnt rick ps: ZFS also has its own export stuff, but it is my understanding that putting a line in /etc/exports is sufficient. I've never used ZFS, so others will know more than I. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
VFS KPI was Re: [OpenAFS-devel] Re: AFS ... or equivalent ...
On Wed, 16 Jan 2008, Robert Watson wrote: [good stuff snipped] Right now we maintain a relatively stable VM/VFS KPI withing a major release (i.e, FreeBSD 6.0 - 6.1 - 6.2 - 6.3), but see fairly significant changes between major releases (5.x - 6.x - 7.x, etc). I expect to see further changes in VFS for 8.x (and some of the locking-related ones have already started going in). This is loosely related to both the OpenAFS thread and the Mac OS X ZFS port thread, so I thought I'd ask... Has anyone considered trying to bring the FreeBSD VFS KPI (and others, for that matter) closed to the Darwin/Mac OS X ones? The Apple folks made quite dramatic changes to their VFS when going from Panther (very FreeBSD like) to Tiger, but seemed to have stabilized, at least for Leopard. It just seems that using the Mac OS X KPIs might leverage some work being done on both sides? (I don't know if there is an OpenAFS port to Mac OS X or interest in one, but I would think there would be a use for one, if it existed?) Although I'm far from an expert on the Mac OS X VFS (when I ported to it, I just cribbed the code and it worked:-), it seems that they pretty well got rid of the concept of a vnode-lock. If the underlying file system isn't SMP safe, it can put a lock on the subsystem at the VFS call. (I think it optionally does a global lock or a uses an smp lock in the vnode, but don't quote me on this. My code currently runs with the thread-safe flag false in the vfs_conf structure entry, which enables the automagic locking.) Just a thought, rick ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]