NFS hangs

2006-02-14 Thread Calzaretta Henry - hcalza
Hello,

 

We have a filesystem which is shared by 4 Linux guests via NFS.  We have
been using this setup for quite some time.  Recently we've seen 2 cases
where access to the NFS file on 1 or more of the non-owning guests began
to slow down.  A df command on the effected system would stop before
the NFS file and hang for 10 seconds.  We've stopped all the tasks
using the file, unmounted, and remounted it, with the same result.  The
only way to resolve the problem was to shutdown and IPL the effected
Linux guest(s).  The owning guest, i.e. the one running the NFS server,
never had to be bounced.

 

The network setup used for these mounts is a Guest LAN.  Linux is SLES8
SP2, VM is V5.1.  We take all the defaults for rsize, wsize, etc. in
/etc/fstab for the mount.

 

/etc/fstab entry:

 

192.168.47.65:/xs2files /xs2files nfs

 

If anyone has seen this scenario before, any insight would be much
appreciated.

 

 

Thanks,

Hank Calzaretta

Acxiom Corp

*
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be
legally privileged.

If the reader of this message is not the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank you.
*

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: NFS hangs

2006-02-14 Thread Neale Ferguson
Some questions:
- What other changes have taken place on the VM system?
- How big are the virtual machines?
- Do other commands on the affected Linux guests respond quickly (what
about on the NFS server(s))?
- How much storage does your VM system have?
- What are the SRM settings for your VM system? (Q SRM from an
apporpriately privileged user)
- What does #CP IND Q report when the hang is happening?
- Do you have a performance tool on your system?

Neale

-Original Message-
Hello,
We have a filesystem which is shared by 4 Linux guests via NFS.  We have
been using this setup for quite some time.  Recently we've seen 2 cases
where access to the NFS file on 1 or more of the non-owning guests began
to slow down.  A df command on the effected system would stop before
the NFS file and hang for 10 seconds.  We've stopped all the tasks
using the file, unmounted, and remounted it, with the same result.  The
only way to resolve the problem was to shutdown and IPL the effected
Linux guest(s).  The owning guest, i.e. the one running the NFS server,
never had to be bounced.

The network setup used for these mounts is a Guest LAN.  Linux is SLES8
SP2, VM is V5.1.  We take all the defaults for rsize, wsize, etc. in
/etc/fstab for the mount.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: NFS hangs

2006-02-14 Thread Alan Altmark
On Tuesday, 02/14/2006 at 11:37 EST, Neale Ferguson [EMAIL PROTECTED]
wrote:
 Some questions:
 - What other changes have taken place on the VM system?
 - How big are the virtual machines?
 - Do other commands on the affected Linux guests respond quickly (what
 about on the NFS server(s))?
 - How much storage does your VM system have?
 - What are the SRM settings for your VM system? (Q SRM from an
 apporpriately privileged user)
 - What does #CP IND Q report when the hang is happening?
 - Do you have a performance tool on your system?

And does #CP QUERY NIC DETAILS on the NFS server and client guests show
TX/RX packet counts going up consistently?  Compared with ifconfig on the
guests?  I.e., is data actually moving?

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: NFS hangs

2006-02-14 Thread Calzaretta Henry - hcalza
Neale,

- No changes have been made to z/VM.

- These are 1G WebSphere guests, each with 768MB java heap size.

- All other commands work fine on all 4 servers.  Only commands going
against the NFS, e.g. df, ls,  on the effected system(s) run slowly
until that system is bounced.

- z/VM has 17GB of storage configured as 13056MB central and 4352MB
expanded.

- We've adjusted the SRM parameters as has everyone running this
environment:
q srm   
IABIAS : INTENSITY=90%; DURATION=2  
LDUBUF : Q1=300% Q2=200% Q3=100%
STORBUF: Q1=200% Q2=175% Q3=150%
DSPBUF : Q1=32767 Q2=32767 Q3=32767 
DISPATCHING MINOR TIMESLICE = 5 MS  
MAXWSS : LIMIT=%
.. : PAGES=99   
XSTORE : 0%
 
- I will run the #CP IND Q report when the problem occurs next.

- We run the IBM Perfkit.

Thanks,
Hank

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Neale Ferguson
Sent: Tuesday, February 14, 2006 10:37 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: NFS hangs

Some questions:
- What other changes have taken place on the VM system?
- How big are the virtual machines?
- Do other commands on the affected Linux guests respond quickly (what
about on the NFS server(s))?
- How much storage does your VM system have?
- What are the SRM settings for your VM system? (Q SRM from an
apporpriately privileged user)
- What does #CP IND Q report when the hang is happening?
- Do you have a performance tool on your system?

Neale

-Original Message-
Hello,
We have a filesystem which is shared by 4 Linux guests via NFS.  We have
been using this setup for quite some time.  Recently we've seen 2 cases
where access to the NFS file on 1 or more of the non-owning guests began
to slow down.  A df command on the effected system would stop before
the NFS file and hang for 10 seconds.  We've stopped all the tasks
using the file, unmounted, and remounted it, with the same result.  The
only way to resolve the problem was to shutdown and IPL the effected
Linux guest(s).  The owning guest, i.e. the one running the NFS server,
never had to be bounced.

The network setup used for these mounts is a Guest LAN.  Linux is SLES8
SP2, VM is V5.1.  We take all the defaults for rsize, wsize, etc. in
/etc/fstab for the mount.

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
*
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be
legally privileged.

If the reader of this message is not the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank you.
*

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: NFS hangs

2006-02-14 Thread Calzaretta Henry - hcalza
Alan,

I will try those commands the next time we see the problem.  The df and
ls commands against the NFS do eventually return, after 10 to 20
seconds, so data is actually moving, albeit slowly.

Thanks,
Hank

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Alan Altmark
Sent: Tuesday, February 14, 2006 10:45 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: NFS hangs

On Tuesday, 02/14/2006 at 11:37 EST, Neale Ferguson
[EMAIL PROTECTED]
wrote:
 Some questions:
 - What other changes have taken place on the VM system?
 - How big are the virtual machines?
 - Do other commands on the affected Linux guests respond quickly (what
 about on the NFS server(s))?
 - How much storage does your VM system have?
 - What are the SRM settings for your VM system? (Q SRM from an
 apporpriately privileged user)
 - What does #CP IND Q report when the hang is happening?
 - Do you have a performance tool on your system?

And does #CP QUERY NIC DETAILS on the NFS server and client guests show
TX/RX packet counts going up consistently?  Compared with ifconfig on
the
guests?  I.e., is data actually moving?

Alan Altmark
z/VM Development
IBM Endicott

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
*
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be
legally privileged.

If the reader of this message is not the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank you.
*

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: NFS hangs

2006-02-14 Thread John Summerfied

Calzaretta Henry - hcalza wrote:

Hello,



We have a filesystem which is shared by 4 Linux guests via NFS.  We have
been using this setup for quite some time.  Recently we've seen 2 cases
where access to the NFS file on 1 or more of the non-owning guests began
to slow down.  A df command on the effected system would stop before
the NFS file and hang for 10 seconds.  We've stopped all the tasks
using the file, unmounted, and remounted it, with the same result.  The
only way to resolve the problem was to shutdown and IPL the effected
Linux guest(s).  The owning guest, i.e. the one running the NFS server,
never had to be bounced.



The network setup used for these mounts is a Guest LAN.  Linux is SLES8
SP2, VM is V5.1.  We take all the defaults for rsize, wsize, etc. in
/etc/fstab for the mount.



/etc/fstab entry:



192.168.47.65:/xs2files /xs2files nfs



If anyone has seen this scenario before, any insight would be much
appreciated.


I haven't seen it for some years; I recall it used to happen a lot with
RHL 5.0, and I don't know when it stopped bothering me

What's in /etc/exports?

Has one of the daemons died?

Are you finding .nfsbla blah files getting left around?


 If you're exporting ro, does mounting -o nolock help?



--

Cheers
John

-- spambait
[EMAIL PROTECTED]  [EMAIL PROTECTED]
Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/

do not reply off-list

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: NFS hangs

2006-02-14 Thread Calzaretta Henry - hcalza
John,

- Here is contents of /etc/exports on the system running the NFS server:

/xs2files 192.168.47.72(rw,sync,no_root_squash)

- The lock and portmap daemons appear to be running on the effected
system(s).

- I don't see any .nfs* files in the filesystem.

- The files are mounted rw as you can see above.

Thanks,
Hank

-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
John Summerfied
Sent: Tuesday, February 14, 2006 11:21 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: NFS hangs

Calzaretta Henry - hcalza wrote:
 Hello,



 We have a filesystem which is shared by 4 Linux guests via NFS.  We
have
 been using this setup for quite some time.  Recently we've seen 2
cases
 where access to the NFS file on 1 or more of the non-owning guests
began
 to slow down.  A df command on the effected system would stop before
 the NFS file and hang for 10 seconds.  We've stopped all the tasks
 using the file, unmounted, and remounted it, with the same result.
The
 only way to resolve the problem was to shutdown and IPL the effected
 Linux guest(s).  The owning guest, i.e. the one running the NFS
server,
 never had to be bounced.



 The network setup used for these mounts is a Guest LAN.  Linux is
SLES8
 SP2, VM is V5.1.  We take all the defaults for rsize, wsize, etc. in
 /etc/fstab for the mount.



 /etc/fstab entry:



 192.168.47.65:/xs2files /xs2files nfs



 If anyone has seen this scenario before, any insight would be much
 appreciated.

I haven't seen it for some years; I recall it used to happen a lot with
RHL 5.0, and I don't know when it stopped bothering me

What's in /etc/exports?

Has one of the daemons died?

Are you finding .nfsbla blah files getting left around?


  If you're exporting ro, does mounting -o nolock help?



--

Cheers
John

-- spambait
[EMAIL PROTECTED]  [EMAIL PROTECTED]
Tourist pics
http://portgeographe.environmentaldisasters.cds.merseine.nu/

do not reply off-list

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
*
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be
legally privileged.

If the reader of this message is not the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank you.
*

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: NFS hangs

2006-02-14 Thread Ifurung, [EMAIL PROTECTED]
We used to experience something similar to this.  A mount appears to
hung but eventually succeeds after a long time.  We then run portmap
service on all the clients and the problem went away.  I never realy
fully understood why this solved the problem. 



-Original Message-
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
John Summerfied
Sent: Tuesday, February 14, 2006 9:21 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: NFS hangs

Calzaretta Henry - hcalza wrote:
 Hello,



 We have a filesystem which is shared by 4 Linux guests via NFS.  We
have
 been using this setup for quite some time.  Recently we've seen 2
cases
 where access to the NFS file on 1 or more of the non-owning guests
began
 to slow down.  A df command on the effected system would stop before
 the NFS file and hang for 10 seconds.  We've stopped all the tasks
 using the file, unmounted, and remounted it, with the same result.
The
 only way to resolve the problem was to shutdown and IPL the effected
 Linux guest(s).  The owning guest, i.e. the one running the NFS
server,
 never had to be bounced.



 The network setup used for these mounts is a Guest LAN.  Linux is
SLES8
 SP2, VM is V5.1.  We take all the defaults for rsize, wsize, etc. in
 /etc/fstab for the mount.



 /etc/fstab entry:



 192.168.47.65:/xs2files /xs2files nfs



 If anyone has seen this scenario before, any insight would be much
 appreciated.

I haven't seen it for some years; I recall it used to happen a lot with
RHL 5.0, and I don't know when it stopped bothering me

What's in /etc/exports?

Has one of the daemons died?

Are you finding .nfsbla blah files getting left around?


  If you're exporting ro, does mounting -o nolock help?



--

Cheers
John

-- spambait
[EMAIL PROTECTED]  [EMAIL PROTECTED]
Tourist pics
http://portgeographe.environmentaldisasters.cds.merseine.nu/

do not reply off-list

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: NFS hangs

2006-02-14 Thread John Summerfied

Calzaretta Henry - hcalza wrote:

John,

- Here is contents of /etc/exports on the system running the NFS server:

/xs2files 192.168.47.72(rw,sync,no_root_squash)

- The lock and portmap daemons appear to be running on the effected
system(s).

- I don't see any .nfs* files in the filesystem.

Just in case you didn't look hard enough to see hidden files:

find /xs2files -type f -name \.nfs\*



- The files are mounted rw as you can see above.


:-) They're exported rw. I'll assume that you know what you're doing on
this tho.

--

Cheers
John

-- spambait
[EMAIL PROTECTED]  [EMAIL PROTECTED]
Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/

do not reply off-list

--
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390


Re: nfs hangs on NetApp NAS device

2004-02-26 Thread Malcolm Beattie
Adam Thornton writes:
 On Wed, 2004-02-25 at 12:42, McKown, John wrote:
  If you do not recommend the soft option (at least for R/W), what else is
  possible? If the NFS server dies or is unavailable for some reason, does
  that mean that all the client boxes which use it should die as well?

 Yes.

 If you're mounting files you need to have read-write, and the underlying
 filesystem goes away, you absolutely do not want to continue operations
 with the files you have open.  If you do keep going, e.g. with a soft
 mount, you're looking at Data Corruption City.

To expand on this a little: there are two independent two-way choices
for how do I want the NFS filesystem to behave when it stops behaving
like the local filesystem it's pretending to be?. One choice is
soft v. hard, the other choice is intr v. nointr. The defaults are
hard and nointr. The four combinations have the following properties:

hard,nointr
  The default. Makes the filesystem behave (a little more) like a local
  filesystem in the sense that a read or write of n bytes will wait
  uninterruptibly until it has fully succeeded or failed[*].
hard,intr
  The useful alternative. Weakens the pretence of local filesystem
  semantics but only a little. If an interrupt (SIGINT, Ctrl/C, ...)
  occurs during a read(), then it returns with errno EINTR or a
  short read (not sure if NFS will actually do the latter).  This
  doesn't usually confuse applications since EINTR must be handled
  anyway in the case it arrives just before the read and if the
  application is designed to cope with reading from terminals, pipes
  or devices then it needs to cope with short reads anyway. An EINTR
  in the middle of a write() is a bit nastier since you don't know
  what happened server-side (but then if you cared about exactly what
  data is on the server you'd either take more care of the NFS
  server or not use NFS).
soft,nointr (or soft,intr I suppose)
  This weakens the pretence of a normal local filesystem even more,
  at least insofar as people trust quality of implementation as
  well as the letter of the law. If the NFS server times out (either
  because it's down or because the network's congested or because
  various timeout values have been tweaked) then the read()/write()
  returns with errno EIO meaning an I/O error. Now, many applications
  follow the methodology of if you can't handle it, don't test for it
  and other follow the methodology of being coded by a lazy git who
  doesn't even test for errors in which case your data is toast. Yes,
  it would also be toast if the local filesystem started giving I/O
  errors but such things are normally handled at a different level
  (shout at whoever implemented the RAID solution and/or the hardware
  vendor).

Of the choices available, hard,intr tends to give much more useful
and safe semantics than soft but, even so, needs careful thought
and effort which could have been prevented by more effort in making
the NFS server more reliable. A default hard mount will pick up
the read/write transparently when the server comes back up again
given the statelessness of NFS[*] so it's only long outages that
matter.

--Malcolm

[*] Yes, those are lies but are close enough for this explanation.

--
Malcolm Beattie [EMAIL PROTECTED]
Linux Technical Consultant
IBM EMEA Enterprise Server Group...
...from home, speaking only for myself


Re: nfs hangs on NetApp NAS device

2004-02-26 Thread Cameron, Thomas
 -Original Message-
 From: Betsie Spann [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, February 25, 2004 12:12 PM
 To: [EMAIL PROTECTED]
 Subject: nfs hangs on NetApp NAS device


 My RH AS 3.0 system frequently hangs with the message   nfs:
 server pafiler not responding
 It's a NetApp NAS device that is ro nfs mounted by the entire
 universe here.  My Linux guest frequently waits on it.  Then
 it has to be rebooted.  Network restarts hang on the server also.
 I am using NFS vers 3 with a timeo value of 40 and
 rsize/wsize of 8192 using UDP.   I'm going to try the tcp
 option next.
 Any suggestions or known problems, please?
 Betsie


Our NetApp rep told us to use:

rw,bg,vers=3,tcp,rsize=8192,wsize=8192,hard,intr

--
Thomas Cameron, RHCE, CNE, MCSE, MCT
Assistant Vice President
Linux Design and Engineering
Bank of America
(972) 997-9641

The opinions expressed in this message are mine alone and do not necessarily reflect 
the opinions of my employer, Bank of America.


nfs hangs on NetApp NAS device

2004-02-25 Thread Betsie Spann
My RH AS 3.0 system frequently hangs with the message   nfs: server pafiler not 
responding
It's a NetApp NAS device that is ro nfs mounted by the entire universe here.  My Linux 
guest frequently waits on it.  Then it has to be rebooted.  Network restarts hang on 
the server also.
I am using NFS vers 3 with a timeo value of 40 and rsize/wsize of 8192 using UDP.   
I'm going to try the tcp option next.  
Any suggestions or known problems, please?
Betsie 


Re: nfs hangs on NetApp NAS device

2004-02-25 Thread McKown, John
I _think_ you need to do a soft NFS mount instead of a hard mount.

Try

http://www.faqs.org/docs/linux_network/x-087-2-nfs.mountd.html

Look at the soft option. The hard option is the default.

--
John McKown
Senior Systems Programmer
UICI Insurance Center
Applications  Solutions Team

This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and its' content is
protected by law.  If you are not the intended recipient, you should delete
this message and are hereby notified that any disclosure, copying, or
distribution of this transmission, or taking any action based on it, is
strictly prohibited.

 -Original Message-
 From: Betsie Spann [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, February 25, 2004 12:12 PM
 To: [EMAIL PROTECTED]
 Subject: nfs hangs on NetApp NAS device


 My RH AS 3.0 system frequently hangs with the message   nfs:
 server pafiler not responding
 It's a NetApp NAS device that is ro nfs mounted by the entire
 universe here.  My Linux guest frequently waits on it.  Then
 it has to be rebooted.  Network restarts hang on the server also.
 I am using NFS vers 3 with a timeo value of 40 and
 rsize/wsize of 8192 using UDP.   I'm going to try the tcp
 option next.
 Any suggestions or known problems, please?
 Betsie



Re: nfs hangs on NetApp NAS device

2004-02-25 Thread Adam Thornton
On Wed, 2004-02-25 at 12:17, McKown, John wrote:
 I _think_ you need to do a soft NFS mount instead of a hard mount.

 Try

 http://www.faqs.org/docs/linux_network/x-087-2-nfs.mountd.html

 Look at the soft option. The hard option is the default.

Well, then you won't have to reboot.

But it does mean the data you're getting isn't guaranteed.

If it's r/o that's probably OK.  Never ever ever mount soft with rw.
At least, that's *my* advice.

Adam


Re: nfs hangs on NetApp NAS device

2004-02-25 Thread McKown, John
 -Original Message-
 From: Adam Thornton [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, February 25, 2004 12:31 PM
 To: [EMAIL PROTECTED]
 Subject: Re: nfs hangs on NetApp NAS device


 On Wed, 2004-02-25 at 12:17, McKown, John wrote:
  I _think_ you need to do a soft NFS mount instead of a
 hard mount.
 
  Try
 
  http://www.faqs.org/docs/linux_network/x-087-2-nfs.mountd.html
 
  Look at the soft option. The hard option is the default.

 Well, then you won't have to reboot.

 But it does mean the data you're getting isn't guaranteed.

 If it's r/o that's probably OK.  Never ever ever mount soft with rw.
 At least, that's *my* advice.

 Adam


Adam,

If you do not recommend the soft option (at least for R/W), what else is
possible? If the NFS server dies or is unavailable for some reason, does
that mean that all the client boxes which use it should die as well? I'm
truly curious because I don't use NFS much. In fact, here at work, we don't
use it at all. I do use it at home to cross-connect two Linux/Intel boxes.


--
John McKown
Senior Systems Programmer
UICI Insurance Center
Applications  Solutions Team

This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and its' content is
protected by law.  If you are not the intended recipient, you should delete
this message and are hereby notified that any disclosure, copying, or
distribution of this transmission, or taking any action based on it, is
strictly prohibited.


Re: nfs hangs on NetApp NAS device

2004-02-25 Thread Fargusson.Alan
When an NFS server dies, or is unavailable the clients should wait until the server 
restarts, or comes back online.  This is what NFS is designed to do anyway.  If you 
box is completely hung there is probably some problem with the client NFS 
implementation.  If your box does not return to normal when the NFS server comes back 
online then there is certainly some problem with the NFS client.

Note that if the server is down for a long time it may take the client a long time to 
realize that the server is back online.

-Original Message-
From: McKown, John [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 25, 2004 10:43 AM
To: [EMAIL PROTECTED]
Subject: Re: nfs hangs on NetApp NAS device


 -Original Message-
 From: Adam Thornton [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, February 25, 2004 12:31 PM
 To: [EMAIL PROTECTED]
 Subject: Re: nfs hangs on NetApp NAS device


 On Wed, 2004-02-25 at 12:17, McKown, John wrote:
  I _think_ you need to do a soft NFS mount instead of a
 hard mount.
 
  Try
 
  http://www.faqs.org/docs/linux_network/x-087-2-nfs.mountd.html
 
  Look at the soft option. The hard option is the default.

 Well, then you won't have to reboot.

 But it does mean the data you're getting isn't guaranteed.

 If it's r/o that's probably OK.  Never ever ever mount soft with rw.
 At least, that's *my* advice.

 Adam


Adam,

If you do not recommend the soft option (at least for R/W), what else is
possible? If the NFS server dies or is unavailable for some reason, does
that mean that all the client boxes which use it should die as well? I'm
truly curious because I don't use NFS much. In fact, here at work, we don't
use it at all. I do use it at home to cross-connect two Linux/Intel boxes.


--
John McKown
Senior Systems Programmer
UICI Insurance Center
Applications  Solutions Team

This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and its' content is
protected by law.  If you are not the intended recipient, you should delete
this message and are hereby notified that any disclosure, copying, or
distribution of this transmission, or taking any action based on it, is
strictly prohibited.


Re: nfs hangs on NetApp NAS device

2004-02-25 Thread Adam Thornton
On Wed, 2004-02-25 at 12:42, McKown, John wrote:
 If you do not recommend the soft option (at least for R/W), what else is
 possible? If the NFS server dies or is unavailable for some reason, does
 that mean that all the client boxes which use it should die as well?

Yes.

If you're mounting files you need to have read-write, and the underlying
filesystem goes away, you absolutely do not want to continue operations
with the files you have open.  If you do keep going, e.g. with a soft
mount, you're looking at Data Corruption City.

 I'm
 truly curious because I don't use NFS much. In fact, here at work, we don't
 use it at all. I do use it at home to cross-connect two Linux/Intel boxes.

I'm not a fan of NFS, although I am given to understand that v3 and v4
work a little better than v2 did.

AFS has a lot of nice features, but it's intrusive and doesn't work
quite like a normal Unix filesystem.  GFS looked promising but I haven't
really followed it recently.  A reasonably-performing distributed
read-write filesystem with Unix semantics would be a wonderful
thing...but I don't know of any such thing.

Adam


Re: nfs hangs on NetApp NAS device

2004-02-25 Thread Alan Cox
On Mer, 2004-02-25 at 20:03, Adam Thornton wrote:
 I'm not a fan of NFS, although I am given to understand that v3 and v4
 work a little better than v2 did.

v2 NFS is fairly simplistic
v3 adds support for files  2Gb and support for client side asynchronous
writeback done safely

v4 adds a ton of stuff but is very new

Generally speaking there isnt a good reason to run v2 except between
boxes that don't speak v3


Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

2003-07-11 Thread Sergey Korzhevsky
Try 'mount -o nfsvers=2 '


WBR, Sergey


Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

2003-07-11 Thread Ashley Chaloner
Ted,

I have had a similar situation mounting an NFS volume (from a Sun) to
either a RH-7.2 or a RH-RawHide VM.

The processes in question seem to be sleeping in function down or
wait_on_inode. So they look like they're in uninterruptible sleep,
so they don't get scheduled, so they never receive their termination
signals.

The problem occurs with automount and ordinary mount, but much more
with automount. If a server goes down and the hard option is
specified, the client's process(es) rightly hang until the server
comes back. However, in this case, NFS handles seem to get lost/broken
in a way that the client's processes think the server is down when it
isn't, so they hang.

(Also, a particular annoyance is that processes in uninterruptible
sleep are counted in the load average so there is a high load average
without any load on the processor.)

Conclusion (guessed): The problem is in the kernel NFS code,
perhaps search the source for: wait_on_inode (in fs/inode.c),
nfs_wait_on_inode (in fs/inode.c), down (in asm/semaphore.h).

I hope this helps. (It looks at least like you're closer to absoving
the VMNFS side of things :-)

Ashley Chaloner.


DCS,UoW,UK.
http://www.dcs.warwick.ac.uk/~csuwf/


On Thu, 10 Jul 2003, Ted Manos wrote:
 Date: Thu, 10 Jul 2003 21:21:15 -0500
 From: Ted Manos [EMAIL PROTECTED]
 Reply-To: Linux on 390 Port [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

 (cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED])


 Hello all (particularly Alan, Romney and crew!),


 We have been doing testing with a new development version of SAS V9 for
 Linux390 for a couple months now, and had not run into any major issues
 until just recently.  We are near the end of our Proof of Concept, and
 just ran into this problem which is a major stumbling block for us.  It
 appears to be an NFS locking issue, and not due to SAS.  However, I learned
 many, many moons ago, back when most of my hair wasn't grey (or I even
 *had* most of my hair, for that matter!) to never rule out ANYTHING until a
 problem/issue is resolved.


 The problem is that when we try to write a SAS-format dataset from
 SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS.  If we write
 the SAS dataset to a local Linux directory, everything is fine.  If we have
 SAS read and/or write a flat-file to the NFS-mounted SFS directory, things
 are fine.  If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS
 directory (the dataset was created on SAS/Unix then Ftp'd to SFS),
 everything is fine.  But if we try to either update or create a new SAS
 dataset on the NFS-mounted SFS directory, it hangs things up tight.  (Note
 that we are *NOT* trying to read or write the SAS datasets from SAS in CMS
 or anywhere else... just use the NFS-mounted SFS directory space as a
 remote storage pool.)


 When it hangs, the only way to get rid of all the remaining spawn zombies
 is to re-IPL the Linux guest.  The kill command will terminate most of the
 processes, but not all of them.   (Yes, I tried killing them from root...
 every way I knew how... but am always open to new ideas/suggestions!)  I
 have no idea at this stage where the hang-up is occurring -- in the Linux
 NFS software, the Linux kernel itself, the VM/CMS NFS server software, one
 of the IP stacks, SAS, or someplace else.  I'm not even sure at this stage
 how to go about tracking it down, since there are a number of parts/pieces
 that all come into play at various stages (I can function fairly well in
 Linux, but I'm no real geek Linux hacker!).

 By hung, I mean that all I/O (at least as far as I can tell) between the
 SAS program running on Linux, the Linux NFS client representing the
 particular Linux mount point/directory being used, and the VMNFS NFS server
 had ceased to occur.  Also,  any further attempts to initiate I/O to that
 NFS mount point, from any other ID/process also hang.  Even root is no
 longer able to do a simple directory on the mount point (e.g. ls -l
 /terry), it hangs.  It appears to be hung due to some kind of lock, or
 pending some condition/state.  That I can readily ascertain, there is no
 CPU or I/O being burned in a loop.

 I do not believe that the problem is SFS, or that SFS is hung.  SFS
 continues to function perfectly normally when accessed from CMS.  I also
 don't *think* that it is the VMNFS server, as that appears to continue to
 function normally for any/all other mount points it is serving, just not
 the one that has hung.

 When I kill the originating process, and finally get it and all of its
 spawn killed off, there still remain two of its spawn which I can not kill,
 even from root, no matter what signal I try to use.  The only way I am able
 to reset everything to that mount point, so it can again be made
 operational, is to completely shutdown and re-IPL that Linux instance, and
 then re-mount all the NFS mount points.  I do NOT have to do *anything

Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

2003-07-11 Thread John Summerfield
On Thu, 10 Jul 2003, Ted Manos wrote:

 (cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED])


 Hello all (particularly Alan, Romney and crew!),


 We have been doing testing with a new development version of SAS V9 for
 Linux390 for a couple months now, and had not run into any major issues
 until just recently.  We are near the end of our Proof of Concept, and
 just ran into this problem which is a major stumbling block for us.  It
 appears to be an NFS locking issue, and not due to SAS.  However, I learned
 many, many moons ago, back when most of my hair wasn't grey (or I even
 *had* most of my hair, for that matter!) to never rule out ANYTHING until a
 problem/issue is resolved.


 The problem is that when we try to write a SAS-format dataset from
 SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS.  If we write
 the SAS dataset to a local Linux directory, everything is fine.  If we have
 SAS read and/or write a flat-file to the NFS-mounted SFS directory, things
 are fine.  If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS
 directory (the dataset was created on SAS/Unix then Ftp'd to SFS),
 everything is fine.  But if we try to either update or create a new SAS
 dataset on the NFS-mounted SFS directory, it hangs things up tight.  (Note
 that we are *NOT* trying to read or write the SAS datasets from SAS in CMS
 or anywhere else... just use the NFS-mounted SFS directory space as a
 remote storage pool.)


 When it hangs, the only way to get rid of all the remaining spawn zombies
 is to re-IPL the Linux guest.  The kill command will terminate most of the
 processes, but not all of them.   (Yes, I tried killing them from root...
 every way I knew how... but am always open to new ideas/suggestions!)  I
 have no idea at this stage where the hang-up is occurring -- in the Linux
 NFS software, the Linux kernel itself, the VM/CMS NFS server software, one
 of the IP stacks, SAS, or someplace else.  I'm not even sure at this stage
 how to go about tracking it down, since there are a number of parts/pieces
 that all come into play at various stages (I can function fairly well in
 Linux, but I'm no real geek Linux hacker!).

 By hung, I mean that all I/O (at least as far as I can tell) between the
 SAS program running on Linux, the Linux NFS client representing the
 particular Linux mount point/directory being used, and the VMNFS NFS server
 had ceased to occur.  Also,  any further attempts to initiate I/O to that
 NFS mount point, from any other ID/process also hang.  Even root is no
 longer able to do a simple directory on the mount point (e.g. ls -l
 /terry), it hangs.  It appears to be hung due to some kind of lock, or
 pending some condition/state.  That I can readily ascertain, there is no
 CPU or I/O being burned in a loop.

 I do not believe that the problem is SFS, or that SFS is hung.  SFS
 continues to function perfectly normally when accessed from CMS.  I also
 don't *think* that it is the VMNFS server, as that appears to continue to
 function normally for any/all other mount points it is serving, just not
 the one that has hung.

 When I kill the originating process, and finally get it and all of its
 spawn killed off, there still remain two of its spawn which I can not kill,
 even from root, no matter what signal I try to use.  The only way I am able
 to reset everything to that mount point, so it can again be made
 operational, is to completely shutdown and re-IPL that Linux instance, and
 then re-mount all the NFS mount points.  I do NOT have to do *anything*
 whatever to VM, SFS or the VMNFS server.

 Does that absolve them completely??  LOL... not in MY lifetime!  I've been
 doing this stuff WAYYY too long to believe that until it is PROVEN to me.
 It is certainly possible that the hang is being caused by some bad/goofy
 permission within Linux, NFS, VMNFS or SFS itself, or even VMSECURE or
 the ESM... or any other part or piece that may come into play.  But, I do
 tend to *doubt* it, since everything else continues to function as is
 should, and the Linux NFS mount point comes back and functions normally
 after Linux has been recycled and the NFS mounts re-issued.

 Unless I am missing something somewhere, it is my belief that an
 NFS-mounted SFS directory should not appear any differently to Linux/Unix
 than any other type of file system structure (with the exception of the 8.8
 filename limitation), since it is  a hierarchical tree directory structure
 and supports very large records.  Record format, record length and blocking
 (if any) shouldn't really be a factor if the file is just being written
 from Linux and read by Linux, with nothing else coming along in between and
 mucking with things.  The NFS-mounted SFS should just look like any other
 Linux/Unix directory/filesystem -- just a pool of disk space available to
 use until you've hit your quota.


 I am running 31-bit Linux 2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT
 2001 s390  under z/VM V4.3.0 (PUT 

Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

2003-07-11 Thread John Summerfield
On Fri, 11 Jul 2003, Ashley Chaloner wrote:

 (Also, a particular annoyance is that processes in uninterruptible
 sleep are counted in the load average so there is a high load average
 without any load on the processor.)

loadaverage counts active processes, and if it's actively waiting on a
device it's average. It's not just CPU-activity.

Processes waiting for their turn at the CPU are counted too, and that's
why one of my systems went past 114 a few months ago.




--


Cheers
John.

Join the Linux Support by Small Businesses list at
http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Copyright John Summerfield. Reproduction prohibited.


Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

2003-07-11 Thread Eddie Chen
  On the LINUX side mount the SFS(bfs)  directory as rsize=1024
wsize=1024... see is is hang
   Also we need you routing config... how many hops???


|-+--
| |   John Summerfield   |
| |   [EMAIL PROTECTED]|
| |   afe.com.au|
| |   Sent by: Linux on  |
| |   390 Port   |
| |   [EMAIL PROTECTED]|
| |   t.edu |
| |  |
| |  |
| |   07/11/2003 02:10 PM|
| |   Please respond to  |
| |   Linux on 390 Port  |
| |  |
|-+--
  
--|
  |
  |
  |   To:   [EMAIL PROTECTED]  
|
  |   cc:   (bcc: Eddie Chen/SIAC) 
  |
  |   Subject:  Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)   
  |
  
--|




On Fri, 11 Jul 2003, Ashley Chaloner wrote:

 (Also, a particular annoyance is that processes in uninterruptible
 sleep are counted in the load average so there is a high load average
 without any load on the processor.)

loadaverage counts active processes, and if it's actively waiting on a
device it's average. It's not just CPU-activity.

Processes waiting for their turn at the CPU are counted too, and that's
why one of my systems went past 114 a few months ago.




--


Cheers
John.

Join the Linux Support by Small Businesses list at
http://mail.computerdatasafe.com.au/mailman/listinfo/lssb
Copyright John Summerfield. Reproduction prohibited.






-
This message and its attachments may contain  privileged and confidential information. 
 If you are not the intended recipient(s), you are prohibited from printing, 
forwarding, saving or copying this email.  If you have received this e-mail in error, 
please immediately notify the sender and delete this e-mail and its attachments from 
your computer.


Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

2003-07-11 Thread Adam Thornton
On Thu, 2003-07-10 at 21:21, Ted Manos wrote:

 I do not believe that the problem is SFS, or that SFS is hung.  SFS
 continues to function perfectly normally when accessed from CMS.  I also
 don't *think* that it is the VMNFS server, as that appears to continue to
 function normally for any/all other mount points it is serving, just not
 the one that has hung.

 When I kill the originating process, and finally get it and all of its
 spawn killed off, there still remain two of its spawn which I can not kill,
 even from root, no matter what signal I try to use.  The only way I am able
 to reset everything to that mount point, so it can again be made
 operational, is to completely shutdown and re-IPL that Linux instance, and
 then re-mount all the NFS mount points.  I do NOT have to do *anything*
 whatever to VM, SFS or the VMNFS server.

This sure sounds like classic NFS-client-stuck-in-disk-wait behavior.

This is also why NFS sucks, but that's neither here nor there.

One thing I'd suggest:

 I am running 31-bit Linux 2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT
 2001 s390  under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor.

Is this the latest and greatest patched kernel from SuSE?  IIRC, SuSE
likes to put the NFS server in the kernel, and so it's possible that
this is a bug that has already been addressed.

If it were *me*, I'd give it a shot with a 2.4.21 kernel and see what
happened.

Adam


Re: NFS hangs writing to SFS from SAS/Linux390 (moderately long)

2003-07-11 Thread Post, Mark K
No, there have been some updates since then:
~  uname -a
Linux 2.4.7-timer-SMP #1 SMP Tue May 21 12:58:16 GMT 2002 s390 unknown

There is a non-timer-patched version from the same date.


Mark Post

-Original Message-
From: Adam Thornton [mailto:[EMAIL PROTECTED]
Sent: Friday, July 11, 2003 11:28 AM
To: [EMAIL PROTECTED]
Subject: Re: NFS hangs writing to SFS from SAS/Linux390 (moderately
long)


On Thu, 2003-07-10 at 21:21, Ted Manos wrote:

-snip-
 I am running 31-bit Linux 2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT
 2001 s390  under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor.

Is this the latest and greatest patched kernel from SuSE?  IIRC, SuSE
likes to put the NFS server in the kernel, and so it's possible that
this is a bug that has already been addressed.

If it were *me*, I'd give it a shot with a 2.4.21 kernel and see what
happened.

Adam


NFS hangs writing to SFS from SAS/Linux390 (moderately long)

2003-07-10 Thread Ted Manos
(cross-posted to [EMAIL PROTECTED] and [EMAIL PROTECTED])


Hello all (particularly Alan, Romney and crew!),


We have been doing testing with a new development version of SAS V9 for
Linux390 for a couple months now, and had not run into any major issues
until just recently.  We are near the end of our Proof of Concept, and
just ran into this problem which is a major stumbling block for us.  It
appears to be an NFS locking issue, and not due to SAS.  However, I learned
many, many moons ago, back when most of my hair wasn't grey (or I even
*had* most of my hair, for that matter!) to never rule out ANYTHING until a
problem/issue is resolved.


The problem is that when we try to write a SAS-format dataset from
SAS/Linux to an NFS-mounted CMS SFS directory, it hangs NFS.  If we write
the SAS dataset to a local Linux directory, everything is fine.  If we have
SAS read and/or write a flat-file to the NFS-mounted SFS directory, things
are fine.  If we have SAS *read* a SAS dataset from an NFS-mounted CMS SFS
directory (the dataset was created on SAS/Unix then Ftp'd to SFS),
everything is fine.  But if we try to either update or create a new SAS
dataset on the NFS-mounted SFS directory, it hangs things up tight.  (Note
that we are *NOT* trying to read or write the SAS datasets from SAS in CMS
or anywhere else... just use the NFS-mounted SFS directory space as a
remote storage pool.)


When it hangs, the only way to get rid of all the remaining spawn zombies
is to re-IPL the Linux guest.  The kill command will terminate most of the
processes, but not all of them.   (Yes, I tried killing them from root...
every way I knew how... but am always open to new ideas/suggestions!)  I
have no idea at this stage where the hang-up is occurring -- in the Linux
NFS software, the Linux kernel itself, the VM/CMS NFS server software, one
of the IP stacks, SAS, or someplace else.  I'm not even sure at this stage
how to go about tracking it down, since there are a number of parts/pieces
that all come into play at various stages (I can function fairly well in
Linux, but I'm no real geek Linux hacker!).

By hung, I mean that all I/O (at least as far as I can tell) between the
SAS program running on Linux, the Linux NFS client representing the
particular Linux mount point/directory being used, and the VMNFS NFS server
had ceased to occur.  Also,  any further attempts to initiate I/O to that
NFS mount point, from any other ID/process also hang.  Even root is no
longer able to do a simple directory on the mount point (e.g. ls -l
/terry), it hangs.  It appears to be hung due to some kind of lock, or
pending some condition/state.  That I can readily ascertain, there is no
CPU or I/O being burned in a loop.

I do not believe that the problem is SFS, or that SFS is hung.  SFS
continues to function perfectly normally when accessed from CMS.  I also
don't *think* that it is the VMNFS server, as that appears to continue to
function normally for any/all other mount points it is serving, just not
the one that has hung.

When I kill the originating process, and finally get it and all of its
spawn killed off, there still remain two of its spawn which I can not kill,
even from root, no matter what signal I try to use.  The only way I am able
to reset everything to that mount point, so it can again be made
operational, is to completely shutdown and re-IPL that Linux instance, and
then re-mount all the NFS mount points.  I do NOT have to do *anything*
whatever to VM, SFS or the VMNFS server.

Does that absolve them completely??  LOL... not in MY lifetime!  I've been
doing this stuff WAYYY too long to believe that until it is PROVEN to me.
It is certainly possible that the hang is being caused by some bad/goofy
permission within Linux, NFS, VMNFS or SFS itself, or even VMSECURE or
the ESM... or any other part or piece that may come into play.  But, I do
tend to *doubt* it, since everything else continues to function as is
should, and the Linux NFS mount point comes back and functions normally
after Linux has been recycled and the NFS mounts re-issued.

Unless I am missing something somewhere, it is my belief that an
NFS-mounted SFS directory should not appear any differently to Linux/Unix
than any other type of file system structure (with the exception of the 8.8
filename limitation), since it is  a hierarchical tree directory structure
and supports very large records.  Record format, record length and blocking
(if any) shouldn't really be a factor if the file is just being written
from Linux and read by Linux, with nothing else coming along in between and
mucking with things.  The NFS-mounted SFS should just look like any other
Linux/Unix directory/filesystem -- just a pool of disk space available to
use until you've hit your quota.


I am running 31-bit Linux 2.4.7-SuSE-SMP #1 SMP Wed Oct 17 15:31:03 GMT
2001 s390  under z/VM V4.3.0 (PUT 0301) on a 2064-1Cx IFL processor.


Below is some related file information which may help someone to show me
the 

Re: [VMESA-L] NFS hangs writing to SFS from SAS/Linux390 (moderat ely long)

2003-07-10 Thread Ferguson, Neale
What was the format of the mount command that you used? (i.e. what options
were specified, just enter mount to display this stuff). If you haven't try
specifying intr and soft as options. This way you should be able to kill
things without a re-IPL. Also what rsize/wsize did you specify? What's the
MTU between the Linux guest and NFS server? Are you using NFS v2 or v3? If
v3 are you using TCP/IP rather than UDP?

Neale

-Original Message-
Hello all (particularly Alan, Romney and crew!),


We have been doing testing with a new development version of SAS V9 for
Linux390 for a couple months now, and had not run into any major issues
until just recently.  We are near the end of our Proof of Concept, and
just ran into this problem which is a major stumbling block for us.  It
appears to be an NFS locking issue, and not due to SAS.  However, I learned
many, many moons ago, back when most of my hair wasn't grey (or I even
*had* most of my hair, for that matter!) to never rule out ANYTHING until a
problem/issue is resolved.