NFS hangs
Hello, We have a filesystem which is shared by 4 Linux guests via NFS. We have been using this setup for quite some time. Recently we've seen 2 cases where access to the NFS file on 1 or more of the non-owning guests began to slow down. A df command on the effected system would stop before the NFS file and hang for 10 seconds. We've stopped all the tasks using the file, unmounted, and remounted it, with the same result. The only way to resolve the problem was to shutdown and IPL the effected Linux guest(s). The owning guest, i.e. the one running the NFS server, never had to be bounced. The network setup used for these mounts is a Guest LAN. Linux is SLES8 SP2, VM is V5.1. We take all the defaults for rsize, wsize, etc. in /etc/fstab for the mount. /etc/fstab entry: 192.168.47.65:/xs2files /xs2files nfs If anyone has seen this scenario before, any insight would be much appreciated. Thanks, Hank Calzaretta Acxiom Corp * The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system. Thank you. * -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: NFS hangs
Some questions: - What other changes have taken place on the VM system? - How big are the virtual machines? - Do other commands on the affected Linux guests respond quickly (what about on the NFS server(s))? - How much storage does your VM system have? - What are the SRM settings for your VM system? (Q SRM from an apporpriately privileged user) - What does #CP IND Q report when the hang is happening? - Do you have a performance tool on your system? Neale -Original Message- Hello, We have a filesystem which is shared by 4 Linux guests via NFS. We have been using this setup for quite some time. Recently we've seen 2 cases where access to the NFS file on 1 or more of the non-owning guests began to slow down. A df command on the effected system would stop before the NFS file and hang for 10 seconds. We've stopped all the tasks using the file, unmounted, and remounted it, with the same result. The only way to resolve the problem was to shutdown and IPL the effected Linux guest(s). The owning guest, i.e. the one running the NFS server, never had to be bounced. The network setup used for these mounts is a Guest LAN. Linux is SLES8 SP2, VM is V5.1. We take all the defaults for rsize, wsize, etc. in /etc/fstab for the mount. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: NFS hangs
On Tuesday, 02/14/2006 at 11:37 EST, Neale Ferguson [EMAIL PROTECTED] wrote: Some questions: - What other changes have taken place on the VM system? - How big are the virtual machines? - Do other commands on the affected Linux guests respond quickly (what about on the NFS server(s))? - How much storage does your VM system have? - What are the SRM settings for your VM system? (Q SRM from an apporpriately privileged user) - What does #CP IND Q report when the hang is happening? - Do you have a performance tool on your system? And does #CP QUERY NIC DETAILS on the NFS server and client guests show TX/RX packet counts going up consistently? Compared with ifconfig on the guests? I.e., is data actually moving? Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: NFS hangs
Neale, - No changes have been made to z/VM. - These are 1G WebSphere guests, each with 768MB java heap size. - All other commands work fine on all 4 servers. Only commands going against the NFS, e.g. df, ls, on the effected system(s) run slowly until that system is bounced. - z/VM has 17GB of storage configured as 13056MB central and 4352MB expanded. - We've adjusted the SRM parameters as has everyone running this environment: q srm IABIAS : INTENSITY=90%; DURATION=2 LDUBUF : Q1=300% Q2=200% Q3=100% STORBUF: Q1=200% Q2=175% Q3=150% DSPBUF : Q1=32767 Q2=32767 Q3=32767 DISPATCHING MINOR TIMESLICE = 5 MS MAXWSS : LIMIT=% .. : PAGES=99 XSTORE : 0% - I will run the #CP IND Q report when the problem occurs next. - We run the IBM Perfkit. Thanks, Hank -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Neale Ferguson Sent: Tuesday, February 14, 2006 10:37 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: NFS hangs Some questions: - What other changes have taken place on the VM system? - How big are the virtual machines? - Do other commands on the affected Linux guests respond quickly (what about on the NFS server(s))? - How much storage does your VM system have? - What are the SRM settings for your VM system? (Q SRM from an apporpriately privileged user) - What does #CP IND Q report when the hang is happening? - Do you have a performance tool on your system? Neale -Original Message- Hello, We have a filesystem which is shared by 4 Linux guests via NFS. We have been using this setup for quite some time. Recently we've seen 2 cases where access to the NFS file on 1 or more of the non-owning guests began to slow down. A df command on the effected system would stop before the NFS file and hang for 10 seconds. We've stopped all the tasks using the file, unmounted, and remounted it, with the same result. The only way to resolve the problem was to shutdown and IPL the effected Linux guest(s). The owning guest, i.e. the one running the NFS server, never had to be bounced. The network setup used for these mounts is a Guest LAN. Linux is SLES8 SP2, VM is V5.1. We take all the defaults for rsize, wsize, etc. in /etc/fstab for the mount. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 * The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system. Thank you. * -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: NFS hangs
Alan, I will try those commands the next time we see the problem. The df and ls commands against the NFS do eventually return, after 10 to 20 seconds, so data is actually moving, albeit slowly. Thanks, Hank -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Alan Altmark Sent: Tuesday, February 14, 2006 10:45 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: NFS hangs On Tuesday, 02/14/2006 at 11:37 EST, Neale Ferguson [EMAIL PROTECTED] wrote: Some questions: - What other changes have taken place on the VM system? - How big are the virtual machines? - Do other commands on the affected Linux guests respond quickly (what about on the NFS server(s))? - How much storage does your VM system have? - What are the SRM settings for your VM system? (Q SRM from an apporpriately privileged user) - What does #CP IND Q report when the hang is happening? - Do you have a performance tool on your system? And does #CP QUERY NIC DETAILS on the NFS server and client guests show TX/RX packet counts going up consistently? Compared with ifconfig on the guests? I.e., is data actually moving? Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 * The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system. Thank you. * -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: NFS hangs
Calzaretta Henry - hcalza wrote: Hello, We have a filesystem which is shared by 4 Linux guests via NFS. We have been using this setup for quite some time. Recently we've seen 2 cases where access to the NFS file on 1 or more of the non-owning guests began to slow down. A df command on the effected system would stop before the NFS file and hang for 10 seconds. We've stopped all the tasks using the file, unmounted, and remounted it, with the same result. The only way to resolve the problem was to shutdown and IPL the effected Linux guest(s). The owning guest, i.e. the one running the NFS server, never had to be bounced. The network setup used for these mounts is a Guest LAN. Linux is SLES8 SP2, VM is V5.1. We take all the defaults for rsize, wsize, etc. in /etc/fstab for the mount. /etc/fstab entry: 192.168.47.65:/xs2files /xs2files nfs If anyone has seen this scenario before, any insight would be much appreciated. I haven't seen it for some years; I recall it used to happen a lot with RHL 5.0, and I don't know when it stopped bothering me What's in /etc/exports? Has one of the daemons died? Are you finding .nfsbla blah files getting left around? If you're exporting ro, does mounting -o nolock help? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: NFS hangs
John, - Here is contents of /etc/exports on the system running the NFS server: /xs2files 192.168.47.72(rw,sync,no_root_squash) - The lock and portmap daemons appear to be running on the effected system(s). - I don't see any .nfs* files in the filesystem. - The files are mounted rw as you can see above. Thanks, Hank -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Tuesday, February 14, 2006 11:21 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: NFS hangs Calzaretta Henry - hcalza wrote: Hello, We have a filesystem which is shared by 4 Linux guests via NFS. We have been using this setup for quite some time. Recently we've seen 2 cases where access to the NFS file on 1 or more of the non-owning guests began to slow down. A df command on the effected system would stop before the NFS file and hang for 10 seconds. We've stopped all the tasks using the file, unmounted, and remounted it, with the same result. The only way to resolve the problem was to shutdown and IPL the effected Linux guest(s). The owning guest, i.e. the one running the NFS server, never had to be bounced. The network setup used for these mounts is a Guest LAN. Linux is SLES8 SP2, VM is V5.1. We take all the defaults for rsize, wsize, etc. in /etc/fstab for the mount. /etc/fstab entry: 192.168.47.65:/xs2files /xs2files nfs If anyone has seen this scenario before, any insight would be much appreciated. I haven't seen it for some years; I recall it used to happen a lot with RHL 5.0, and I don't know when it stopped bothering me What's in /etc/exports? Has one of the daemons died? Are you finding .nfsbla blah files getting left around? If you're exporting ro, does mounting -o nolock help? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 * The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system. Thank you. * -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: NFS hangs
We used to experience something similar to this. A mount appears to hung but eventually succeeds after a long time. We then run portmap service on all the clients and the problem went away. I never realy fully understood why this solved the problem. -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Tuesday, February 14, 2006 9:21 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: NFS hangs Calzaretta Henry - hcalza wrote: Hello, We have a filesystem which is shared by 4 Linux guests via NFS. We have been using this setup for quite some time. Recently we've seen 2 cases where access to the NFS file on 1 or more of the non-owning guests began to slow down. A df command on the effected system would stop before the NFS file and hang for 10 seconds. We've stopped all the tasks using the file, unmounted, and remounted it, with the same result. The only way to resolve the problem was to shutdown and IPL the effected Linux guest(s). The owning guest, i.e. the one running the NFS server, never had to be bounced. The network setup used for these mounts is a Guest LAN. Linux is SLES8 SP2, VM is V5.1. We take all the defaults for rsize, wsize, etc. in /etc/fstab for the mount. /etc/fstab entry: 192.168.47.65:/xs2files /xs2files nfs If anyone has seen this scenario before, any insight would be much appreciated. I haven't seen it for some years; I recall it used to happen a lot with RHL 5.0, and I don't know when it stopped bothering me What's in /etc/exports? Has one of the daemons died? Are you finding .nfsbla blah files getting left around? If you're exporting ro, does mounting -o nolock help? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: NFS hangs
Calzaretta Henry - hcalza wrote: John, - Here is contents of /etc/exports on the system running the NFS server: /xs2files 192.168.47.72(rw,sync,no_root_squash) - The lock and portmap daemons appear to be running on the effected system(s). - I don't see any .nfs* files in the filesystem. Just in case you didn't look hard enough to see hidden files: find /xs2files -type f -name \.nfs\* - The files are mounted rw as you can see above. :-) They're exported rw. I'll assume that you know what you're doing on this tho. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: nfs hangs on NetApp NAS device
Adam Thornton writes: On Wed, 2004-02-25 at 12:42, McKown, John wrote: If you do not recommend the soft option (at least for R/W), what else is possible? If the NFS server dies or is unavailable for some reason, does that mean that all the client boxes which use it should die as well? Yes. If you're mounting files you need to have read-write, and the underlying filesystem goes away, you absolutely do not want to continue operations with the files you have open. If you do keep going, e.g. with a soft mount, you're looking at Data Corruption City. To expand on this a little: there are two independent two-way choices for how do I want the NFS filesystem to behave when it stops behaving like the local filesystem it's pretending to be?. One choice is soft v. hard, the other choice is intr v. nointr. The defaults are hard and nointr. The four combinations have the following properties: hard,nointr The default. Makes the filesystem behave (a little more) like a local filesystem in the sense that a read or write of n bytes will wait uninterruptibly until it has fully succeeded or failed[*]. hard,intr The useful alternative. Weakens the pretence of local filesystem semantics but only a little. If an interrupt (SIGINT, Ctrl/C, ...) occurs during a read(), then it returns with errno EINTR or a short read (not sure if NFS will actually do the latter). This doesn't usually confuse applications since EINTR must be handled anyway in the case it arrives just before the read and if the application is designed to cope with reading from terminals, pipes or devices then it needs to cope with short reads anyway. An EINTR in the middle of a write() is a bit nastier since you don't know what happened server-side (but then if you cared about exactly what data is on the server you'd either take more care of the NFS server or not use NFS). soft,nointr (or soft,intr I suppose) This weakens the pretence of a normal local filesystem even more, at least insofar as people trust quality of implementation as well as the letter of the law. If the NFS server times out (either because it's down or because the network's congested or because various timeout values have been tweaked) then the read()/write() returns with errno EIO meaning an I/O error. Now, many applications follow the methodology of if you can't handle it, don't test for it and other follow the methodology of being coded by a lazy git who doesn't even test for errors in which case your data is toast. Yes, it would also be toast if the local filesystem started giving I/O errors but such things are normally handled at a different level (shout at whoever implemented the RAID solution and/or the hardware vendor). Of the choices available, hard,intr tends to give much more useful and safe semantics than soft but, even so, needs careful thought and effort which could have been prevented by more effort in making the NFS server more reliable. A default hard mount will pick up the read/write transparently when the server comes back up again given the statelessness of NFS[*] so it's only long outages that matter. --Malcolm [*] Yes, those are lies but are close enough for this explanation. -- Malcolm Beattie [EMAIL PROTECTED] Linux Technical Consultant IBM EMEA Enterprise Server Group... ...from home, speaking only for myself
Re: nfs hangs on NetApp NAS device
-Original Message- From: Betsie Spann [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 25, 2004 12:12 PM To: [EMAIL PROTECTED] Subject: nfs hangs on NetApp NAS device My RH AS 3.0 system frequently hangs with the message nfs: server pafiler not responding It's a NetApp NAS device that is ro nfs mounted by the entire universe here. My Linux guest frequently waits on it. Then it has to be rebooted. Network restarts hang on the server also. I am using NFS vers 3 with a timeo value of 40 and rsize/wsize of 8192 using UDP. I'm going to try the tcp option next. Any suggestions or known problems, please? Betsie Our NetApp rep told us to use: rw,bg,vers=3,tcp,rsize=8192,wsize=8192,hard,intr -- Thomas Cameron, RHCE, CNE, MCSE, MCT Assistant Vice President Linux Design and Engineering Bank of America (972) 997-9641 The opinions expressed in this message are mine alone and do not necessarily reflect the opinions of my employer, Bank of America.
nfs hangs on NetApp NAS device
My RH AS 3.0 system frequently hangs with the message nfs: server pafiler not responding It's a NetApp NAS device that is ro nfs mounted by the entire universe here. My Linux guest frequently waits on it. Then it has to be rebooted. Network restarts hang on the server also. I am using NFS vers 3 with a timeo value of 40 and rsize/wsize of 8192 using UDP. I'm going to try the tcp option next. Any suggestions or known problems, please? Betsie
Re: nfs hangs on NetApp NAS device
I _think_ you need to do a soft NFS mount instead of a hard mount. Try http://www.faqs.org/docs/linux_network/x-087-2-nfs.mountd.html Look at the soft option. The hard option is the default. -- John McKown Senior Systems Programmer UICI Insurance Center Applications Solutions Team This message (including any attachments) contains confidential information intended for a specific individual and purpose, and its' content is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this transmission, or taking any action based on it, is strictly prohibited. -Original Message- From: Betsie Spann [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 25, 2004 12:12 PM To: [EMAIL PROTECTED] Subject: nfs hangs on NetApp NAS device My RH AS 3.0 system frequently hangs with the message nfs: server pafiler not responding It's a NetApp NAS device that is ro nfs mounted by the entire universe here. My Linux guest frequently waits on it. Then it has to be rebooted. Network restarts hang on the server also. I am using NFS vers 3 with a timeo value of 40 and rsize/wsize of 8192 using UDP. I'm going to try the tcp option next. Any suggestions or known problems, please? Betsie
Re: nfs hangs on NetApp NAS device
On Wed, 2004-02-25 at 12:17, McKown, John wrote: I _think_ you need to do a soft NFS mount instead of a hard mount. Try http://www.faqs.org/docs/linux_network/x-087-2-nfs.mountd.html Look at the soft option. The hard option is the default. Well, then you won't have to reboot. But it does mean the data you're getting isn't guaranteed. If it's r/o that's probably OK. Never ever ever mount soft with rw. At least, that's *my* advice. Adam
Re: nfs hangs on NetApp NAS device
-Original Message- From: Adam Thornton [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 25, 2004 12:31 PM To: [EMAIL PROTECTED] Subject: Re: nfs hangs on NetApp NAS device On Wed, 2004-02-25 at 12:17, McKown, John wrote: I _think_ you need to do a soft NFS mount instead of a hard mount. Try http://www.faqs.org/docs/linux_network/x-087-2-nfs.mountd.html Look at the soft option. The hard option is the default. Well, then you won't have to reboot. But it does mean the data you're getting isn't guaranteed. If it's r/o that's probably OK. Never ever ever mount soft with rw. At least, that's *my* advice. Adam Adam, If you do not recommend the soft option (at least for R/W), what else is possible? If the NFS server dies or is unavailable for some reason, does that mean that all the client boxes which use it should die as well? I'm truly curious because I don't use NFS much. In fact, here at work, we don't use it at all. I do use it at home to cross-connect two Linux/Intel boxes. -- John McKown Senior Systems Programmer UICI Insurance Center Applications Solutions Team This message (including any attachments) contains confidential information intended for a specific individual and purpose, and its' content is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this transmission, or taking any action based on it, is strictly prohibited.
Re: nfs hangs on NetApp NAS device
When an NFS server dies, or is unavailable the clients should wait until the server restarts, or comes back online. This is what NFS is designed to do anyway. If you box is completely hung there is probably some problem with the client NFS implementation. If your box does not return to normal when the NFS server comes back online then there is certainly some problem with the NFS client. Note that if the server is down for a long time it may take the client a long time to realize that the server is back online. -Original Message- From: McKown, John [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 25, 2004 10:43 AM To: [EMAIL PROTECTED] Subject: Re: nfs hangs on NetApp NAS device -Original Message- From: Adam Thornton [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 25, 2004 12:31 PM To: [EMAIL PROTECTED] Subject: Re: nfs hangs on NetApp NAS device On Wed, 2004-02-25 at 12:17, McKown, John wrote: I _think_ you need to do a soft NFS mount instead of a hard mount. Try http://www.faqs.org/docs/linux_network/x-087-2-nfs.mountd.html Look at the soft option. The hard option is the default. Well, then you won't have to reboot. But it does mean the data you're getting isn't guaranteed. If it's r/o that's probably OK. Never ever ever mount soft with rw. At least, that's *my* advice. Adam Adam, If you do not recommend the soft option (at least for R/W), what else is possible? If the NFS server dies or is unavailable for some reason, does that mean that all the client boxes which use it should die as well? I'm truly curious because I don't use NFS much. In fact, here at work, we don't use it at all. I do use it at home to cross-connect two Linux/Intel boxes. -- John McKown Senior Systems Programmer UICI Insurance Center Applications Solutions Team This message (including any attachments) contains confidential information intended for a specific individual and purpose, and its' content is protected by law. If you are not the intended recipient, you should delete this message and are hereby notified that any disclosure, copying, or distribution of this transmission, or taking any action based on it, is strictly prohibited.
Re: nfs hangs on NetApp NAS device
On Wed, 2004-02-25 at 12:42, McKown, John wrote: If you do not recommend the soft option (at least for R/W), what else is possible? If the NFS server dies or is unavailable for some reason, does that mean that all the client boxes which use it should die as well? Yes. If you're mounting files you need to have read-write, and the underlying filesystem goes away, you absolutely do not want to continue operations with the files you have open. If you do keep going, e.g. with a soft mount, you're looking at Data Corruption City. I'm truly curious because I don't use NFS much. In fact, here at work, we don't use it at all. I do use it at home to cross-connect two Linux/Intel boxes. I'm not a fan of NFS, although I am given to understand that v3 and v4 work a little better than v2 did. AFS has a lot of nice features, but it's intrusive and doesn't work quite like a normal Unix filesystem. GFS looked promising but I haven't really followed it recently. A reasonably-performing distributed read-write filesystem with Unix semantics would be a wonderful thing...but I don't know of any such thing. Adam
Re: nfs hangs on NetApp NAS device
On Mer, 2004-02-25 at 20:03, Adam Thornton wrote: I'm not a fan of NFS, although I am given to understand that v3 and v4 work a little better than v2 did. v2 NFS is fairly simplistic v3 adds support for files 2Gb and support for client side asynchronous writeback done safely v4 adds a ton of stuff but is very new Generally speaking there isnt a good reason to run v2 except between boxes that don't speak v3
Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)
Try 'mount -o nfsvers=2 ' WBR, Sergey
Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)
Ted, I have had a similar situation mounting an NFS volume (from a Sun) to either a RH-7.2 or a RH-RawHide VM. The processes in question seem to be sleeping in function down or wait_on_inode. So they look like they're in uninterruptible sleep, so they don't get scheduled, so they never receive their termination signals. The problem occurs with automount and ordinary mount, but much more with automount. If a server goes down and the hard option is specified, the client's process(es) rightly hang until the server comes back. However, in this case, NFS handles seem to get lost/broken in a way that the client's processes think the server is down when it isn't, so they hang. (Also, a particular annoyance is that processes in uninterruptible sleep are counted in the load average so there is a high load average without any load on the processor.) Conclusion (guessed): The problem is in the kernel NFS code, perhaps search the source for: wait_on_inode (in fs/inode.c), nfs_wait_on_inode (in fs/inode.c), down (in asm/semaphore.h). I hope this helps. (It looks at least like you're closer to absoving the VMNFS side of things :-) Ashley Chaloner. DCS,UoW,UK. http://www.dcs.warwick.ac.uk/~csuwf/ On Thu, 10 Jul 2003, Ted Manos wrote: Date: Thu, 10 Jul 2003 21:21:15 -0500 From: Ted Manos [EMAIL PROTECTED] Reply-To: Linux on 390 Port [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: NFS hangs writing to SFS from SAS/Linux390 (moderately long) (cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED]) Hello all (particularly Alan, Romney and crew!), We have been doing testing with a new development version of SAS V9 for Linux390 for a couple months now, and had not run into any major issues until just recently. We are near the end of our Proof of Concept, and just ran into this problem which is a major stumbling block for us. It appears to be an NFS locking issue, and not due to SAS. However, I learned many, many moons ago, back when most of my hair wasn't grey (or I even *had* most of my hair, for that matter!) to never rule out ANYTHING until a problem/issue is resolved. The problem is that when we try to write a SAS-format dataset from SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS. If we write the SAS dataset to a local Linux directory, everything is fine. If we have SAS read and/or write a flat-file to the NFS-mounted SFS directory, things are fine. If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS directory (the dataset was created on SAS/Unix then Ftp'd to SFS), everything is fine. But if we try to either update or create a new SAS dataset on the NFS-mounted SFS directory, it hangs things up tight. (Note that we are *NOT* trying to read or write the SAS datasets from SAS in CMS or anywhere else... just use the NFS-mounted SFS directory space as a remote storage pool.) When it hangs, the only way to get rid of all the remaining spawn zombies is to re-IPL the Linux guest. The kill command will terminate most of the processes, but not all of them. (Yes, I tried killing them from root... every way I knew how... but am always open to new ideas/suggestions!) I have no idea at this stage where the hang-up is occurring -- in the Linux NFS software, the Linux kernel itself, the VM/CMS NFS server software, one of the IP stacks, SAS, or someplace else. I'm not even sure at this stage how to go about tracking it down, since there are a number of parts/pieces that all come into play at various stages (I can function fairly well in Linux, but I'm no real geek Linux hacker!). By hung, I mean that all I/O (at least as far as I can tell) between the SAS program running on Linux, the Linux NFS client representing the particular Linux mount point/directory being used, and the VMNFS NFS server had ceased to occur. Also, any further attempts to initiate I/O to that NFS mount point, from any other ID/process also hang. Even root is no longer able to do a simple directory on the mount point (e.g. ls -l /terry), it hangs. It appears to be hung due to some kind of lock, or pending some condition/state. That I can readily ascertain, there is no CPU or I/O being burned in a loop. I do not believe that the problem is SFS, or that SFS is hung. SFS continues to function perfectly normally when accessed from CMS. I also don't *think* that it is the VMNFS server, as that appears to continue to function normally for any/all other mount points it is serving, just not the one that has hung. When I kill the originating process, and finally get it and all of its spawn killed off, there still remain two of its spawn which I can not kill, even from root, no matter what signal I try to use. The only way I am able to reset everything to that mount point, so it can again be made operational, is to completely shutdown and re-IPL that Linux instance, and then re-mount all the NFS mount points. I do NOT have to do *anything
Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)
On Thu, 10 Jul 2003, Ted Manos wrote: (cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED]) Hello all (particularly Alan, Romney and crew!), We have been doing testing with a new development version of SAS V9 for Linux390 for a couple months now, and had not run into any major issues until just recently. We are near the end of our Proof of Concept, and just ran into this problem which is a major stumbling block for us. It appears to be an NFS locking issue, and not due to SAS. However, I learned many, many moons ago, back when most of my hair wasn't grey (or I even *had* most of my hair, for that matter!) to never rule out ANYTHING until a problem/issue is resolved. The problem is that when we try to write a SAS-format dataset from SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS. If we write the SAS dataset to a local Linux directory, everything is fine. If we have SAS read and/or write a flat-file to the NFS-mounted SFS directory, things are fine. If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS directory (the dataset was created on SAS/Unix then Ftp'd to SFS), everything is fine. But if we try to either update or create a new SAS dataset on the NFS-mounted SFS directory, it hangs things up tight. (Note that we are *NOT* trying to read or write the SAS datasets from SAS in CMS or anywhere else... just use the NFS-mounted SFS directory space as a remote storage pool.) When it hangs, the only way to get rid of all the remaining spawn zombies is to re-IPL the Linux guest. The kill command will terminate most of the processes, but not all of them. (Yes, I tried killing them from root... every way I knew how... but am always open to new ideas/suggestions!) I have no idea at this stage where the hang-up is occurring -- in the Linux NFS software, the Linux kernel itself, the VM/CMS NFS server software, one of the IP stacks, SAS, or someplace else. I'm not even sure at this stage how to go about tracking it down, since there are a number of parts/pieces that all come into play at various stages (I can function fairly well in Linux, but I'm no real geek Linux hacker!). By hung, I mean that all I/O (at least as far as I can tell) between the SAS program running on Linux, the Linux NFS client representing the particular Linux mount point/directory being used, and the VMNFS NFS server had ceased to occur. Also, any further attempts to initiate I/O to that NFS mount point, from any other ID/process also hang. Even root is no longer able to do a simple directory on the mount point (e.g. ls -l /terry), it hangs. It appears to be hung due to some kind of lock, or pending some condition/state. That I can readily ascertain, there is no CPU or I/O being burned in a loop. I do not believe that the problem is SFS, or that SFS is hung. SFS continues to function perfectly normally when accessed from CMS. I also don't *think* that it is the VMNFS server, as that appears to continue to function normally for any/all other mount points it is serving, just not the one that has hung. When I kill the originating process, and finally get it and all of its spawn killed off, there still remain two of its spawn which I can not kill, even from root, no matter what signal I try to use. The only way I am able to reset everything to that mount point, so it can again be made operational, is to completely shutdown and re-IPL that Linux instance, and then re-mount all the NFS mount points. I do NOT have to do *anything* whatever to VM, SFS or the VMNFS server. Does that absolve them completely?? LOL... not in MY lifetime! I've been doing this stuff WAYYY too long to believe that until it is PROVEN to me. It is certainly possible that the hang is being caused by some bad/goofy permission within Linux, NFS, VMNFS or SFS itself, or even VMSECURE or the ESM... or any other part or piece that may come into play. But, I do tend to *doubt* it, since everything else continues to function as is should, and the Linux NFS mount point comes back and functions normally after Linux has been recycled and the NFS mounts re-issued. Unless I am missing something somewhere, it is my belief that an NFS-mounted SFS directory should not appear any differently to Linux/Unix than any other type of file system structure (with the exception of the 8.8 filename limitation), since it is a hierarchical tree directory structure and supports very large records. Record format, record length and blocking (if any) shouldn't really be a factor if the file is just being written from Linux and read by Linux, with nothing else coming along in between and mucking with things. The NFS-mounted SFS should just look like any other Linux/Unix directory/filesystem -- just a pool of disk space available to use until you've hit your quota. I am running 31-bit Linux 2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT 2001 s390 under z/VM V4.3.0 (PUT
Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)
On Fri, 11 Jul 2003, Ashley Chaloner wrote: (Also, a particular annoyance is that processes in uninterruptible sleep are counted in the load average so there is a high load average without any load on the processor.) loadaverage counts active processes, and if it's actively waiting on a device it's average. It's not just CPU-activity. Processes waiting for their turn at the CPU are counted too, and that's why one of my systems went past 114 a few months ago. -- Cheers John. Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb Copyright John Summerfield. Reproduction prohibited.
Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)
On the LINUX side mount the SFS(bfs) directory as rsize=1024 wsize=1024... see is is hang Also we need you routing config... how many hops??? |-+-- | | John Summerfield | | | [EMAIL PROTECTED]| | | afe.com.au| | | Sent by: Linux on | | | 390 Port | | | [EMAIL PROTECTED]| | | t.edu | | | | | | | | | 07/11/2003 02:10 PM| | | Please respond to | | | Linux on 390 Port | | | | |-+-- --| | | | To: [EMAIL PROTECTED] | | cc: (bcc: Eddie Chen/SIAC) | | Subject: Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long) | --| On Fri, 11 Jul 2003, Ashley Chaloner wrote: (Also, a particular annoyance is that processes in uninterruptible sleep are counted in the load average so there is a high load average without any load on the processor.) loadaverage counts active processes, and if it's actively waiting on a device it's average. It's not just CPU-activity. Processes waiting for their turn at the CPU are counted too, and that's why one of my systems went past 114 a few months ago. -- Cheers John. Join the Linux Support by Small Businesses list at http://mail.computerdatasafe.com.au/mailman/listinfo/lssb Copyright John Summerfield. Reproduction prohibited. - This message and its attachments may contain privileged and confidential information. If you are not the intended recipient(s), you are prohibited from printing, forwarding, saving or copying this email. If you have received this e-mail in error, please immediately notify the sender and delete this e-mail and its attachments from your computer.
Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)
On Thu, 2003-07-10 at 21:21, Ted Manos wrote: I do not believe that the problem is SFS, or that SFS is hung. SFS continues to function perfectly normally when accessed from CMS. I also don't *think* that it is the VMNFS server, as that appears to continue to function normally for any/all other mount points it is serving, just not the one that has hung. When I kill the originating process, and finally get it and all of its spawn killed off, there still remain two of its spawn which I can not kill, even from root, no matter what signal I try to use. The only way I am able to reset everything to that mount point, so it can again be made operational, is to completely shutdown and re-IPL that Linux instance, and then re-mount all the NFS mount points. I do NOT have to do *anything* whatever to VM, SFS or the VMNFS server. This sure sounds like classic NFS-client-stuck-in-disk-wait behavior. This is also why NFS sucks, but that's neither here nor there. One thing I'd suggest: I am running 31-bit Linux 2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT 2001 s390 under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor. Is this the latest and greatest patched kernel from SuSE? IIRC, SuSE likes to put the NFS server in the kernel, and so it's possible that this is a bug that has already been addressed. If it were *me*, I'd give it a shot with a 2.4.21 kernel and see what happened. Adam
Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)
No, there have been some updates since then: ~ uname -a Linux 2.4.7-timer-SMP #1 SMP Tue May 21 12:58:16 GMT 2002 s390 unknown There is a non-timer-patched version from the same date. Mark Post -Original Message- From: Adam Thornton [mailto:[EMAIL PROTECTED] Sent: Friday, July 11, 2003 11:28 AM To: [EMAIL PROTECTED] Subject: Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long) On Thu, 2003-07-10 at 21:21, Ted Manos wrote: -snip- I am running 31-bit Linux 2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT 2001 s390 under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor. Is this the latest and greatest patched kernel from SuSE? IIRC, SuSE likes to put the NFS server in the kernel, and so it's possible that this is a bug that has already been addressed. If it were *me*, I'd give it a shot with a 2.4.21 kernel and see what happened. Adam
NFS hangs writing to SFS from SAS/Linux390 (moderately long)
(cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED]) Hello all (particularly Alan, Romney and crew!), We have been doing testing with a new development version of SAS V9 for Linux390 for a couple months now, and had not run into any major issues until just recently. We are near the end of our Proof of Concept, and just ran into this problem which is a major stumbling block for us. It appears to be an NFS locking issue, and not due to SAS. However, I learned many, many moons ago, back when most of my hair wasn't grey (or I even *had* most of my hair, for that matter!) to never rule out ANYTHING until a problem/issue is resolved. The problem is that when we try to write a SAS-format dataset from SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS. If we write the SAS dataset to a local Linux directory, everything is fine. If we have SAS read and/or write a flat-file to the NFS-mounted SFS directory, things are fine. If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS directory (the dataset was created on SAS/Unix then Ftp'd to SFS), everything is fine. But if we try to either update or create a new SAS dataset on the NFS-mounted SFS directory, it hangs things up tight. (Note that we are *NOT* trying to read or write the SAS datasets from SAS in CMS or anywhere else... just use the NFS-mounted SFS directory space as a remote storage pool.) When it hangs, the only way to get rid of all the remaining spawn zombies is to re-IPL the Linux guest. The kill command will terminate most of the processes, but not all of them. (Yes, I tried killing them from root... every way I knew how... but am always open to new ideas/suggestions!) I have no idea at this stage where the hang-up is occurring -- in the Linux NFS software, the Linux kernel itself, the VM/CMS NFS server software, one of the IP stacks, SAS, or someplace else. I'm not even sure at this stage how to go about tracking it down, since there are a number of parts/pieces that all come into play at various stages (I can function fairly well in Linux, but I'm no real geek Linux hacker!). By hung, I mean that all I/O (at least as far as I can tell) between the SAS program running on Linux, the Linux NFS client representing the particular Linux mount point/directory being used, and the VMNFS NFS server had ceased to occur. Also, any further attempts to initiate I/O to that NFS mount point, from any other ID/process also hang. Even root is no longer able to do a simple directory on the mount point (e.g. ls -l /terry), it hangs. It appears to be hung due to some kind of lock, or pending some condition/state. That I can readily ascertain, there is no CPU or I/O being burned in a loop. I do not believe that the problem is SFS, or that SFS is hung. SFS continues to function perfectly normally when accessed from CMS. I also don't *think* that it is the VMNFS server, as that appears to continue to function normally for any/all other mount points it is serving, just not the one that has hung. When I kill the originating process, and finally get it and all of its spawn killed off, there still remain two of its spawn which I can not kill, even from root, no matter what signal I try to use. The only way I am able to reset everything to that mount point, so it can again be made operational, is to completely shutdown and re-IPL that Linux instance, and then re-mount all the NFS mount points. I do NOT have to do *anything* whatever to VM, SFS or the VMNFS server. Does that absolve them completely?? LOL... not in MY lifetime! I've been doing this stuff WAYYY too long to believe that until it is PROVEN to me. It is certainly possible that the hang is being caused by some bad/goofy permission within Linux, NFS, VMNFS or SFS itself, or even VMSECURE or the ESM... or any other part or piece that may come into play. But, I do tend to *doubt* it, since everything else continues to function as is should, and the Linux NFS mount point comes back and functions normally after Linux has been recycled and the NFS mounts re-issued. Unless I am missing something somewhere, it is my belief that an NFS-mounted SFS directory should not appear any differently to Linux/Unix than any other type of file system structure (with the exception of the 8.8 filename limitation), since it is a hierarchical tree directory structure and supports very large records. Record format, record length and blocking (if any) shouldn't really be a factor if the file is just being written from Linux and read by Linux, with nothing else coming along in between and mucking with things. The NFS-mounted SFS should just look like any other Linux/Unix directory/filesystem -- just a pool of disk space available to use until you've hit your quota. I am running 31-bit Linux 2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT 2001 s390 under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor. Below is some related file information which may help someone to show me the
Re: [VMESA-L] NFS hangs writing to SFS from SAS/Linux390 (moderat ely long)
What was the format of the mount command that you used? (i.e. what options were specified, just enter mount to display this stuff). If you haven't try specifying intr and soft as options. This way you should be able to kill things without a re-IPL. Also what rsize/wsize did you specify? What's the MTU between the Linux guest and NFS server? Are you using NFS v2 or v3? If v3 are you using TCP/IP rather than UDP? Neale -Original Message- Hello all (particularly Alan, Romney and crew!), We have been doing testing with a new development version of SAS V9 for Linux390 for a couple months now, and had not run into any major issues until just recently. We are near the end of our Proof of Concept, and just ran into this problem which is a major stumbling block for us. It appears to be an NFS locking issue, and not due to SAS. However, I learned many, many moons ago, back when most of my hair wasn't grey (or I even *had* most of my hair, for that matter!) to never rule out ANYTHING until a problem/issue is resolved.