Re: F12 NFS Failures

2009-12-21 Thread Gordon Messmer

On 11/24/2009 04:21 AM, John Austin wrote:


Just tested my machine with UDP and TCP
This was using md5sum for about 10GB over the NFS mount

1. The default for F12/Centos5.4 appears to be TCP - which freezes
2. Forcing UDP gives NO errors for 10GB transfer
3. Forcing TCP gives a freeze
   


I know this is an old thread, but I thought I'd toss in that you will 
see symptoms very much like this if only one of your machines (probably 
the NFS server) is configured to use jumbo frames.  You should check the 
MTU on the server and client.


--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: F12 NFS Failures

2009-12-21 Thread John Austin
On Mon, 2009-12-21 at 00:39 -0800, Gordon Messmer wrote:
 On 11/24/2009 04:21 AM, John Austin wrote:
 
  Just tested my machine with UDP and TCP
  This was using md5sum for about 10GB over the NFS mount
 
  1. The default for F12/Centos5.4 appears to be TCP - which freezes
  2. Forcing UDP gives NO errors for 10GB transfer
  3. Forcing TCP gives a freeze
 
 
 I know this is an old thread, but I thought I'd toss in that you will 
 see symptoms very much like this if only one of your machines (probably 
 the NFS server) is configured to use jumbo frames.  You should check the 
 MTU on the server and client.
 
Thanks for the idea
I have checked the host and the server, both are set to MTU of 1500

I then checked the switch (Netgear GS108T) this had jumbo frames enabled
Disabled jumbo frames - no change
Updated switch firmware - still no change

Problem still present with all F12 kernel versions (sky2 drivers) to
date

I have taken sky2 driver from latest stable kernel and tried to compile
under F12 but failed!

As I have a work around with 2nd NIC I have been lazy !!
Next move probably a custom kernel

John






-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: F12 NFS Failures

2009-12-01 Thread Todd Denniston

John Austin wrote, On Tue, 24 Nov 2009 12:21:58 +:

On Mon, 2009-11-23 at 15:00 -0800, Rick Stevens wrote:

On 11/21/2009 10:41 AM, John Austin wrote:

On Sat, 2009-11-21 at 11:11 -0700, Greg Woods wrote:

On Sat, 2009-11-21 at 10:09 +, John Austin wrote:


When copying a large file (2.7GB) from the server to the
F12 m/c a complete freeze of the F12 machine occurs.


I haven't seen freezes, but I have seen corruption when trying to copy
large files (e.g. like a DVD iso image) via NFS. In fact, this happened
to me when I was trying to install an F12 virtual machine on my F11 box
(so I could try it out before deciding whether or not to bite the bullet
and upgrade the host OS). I copied over the DVD iso image, then tried to
install a VM from it, and it failed the media test. Sure enough, it also
failed the sha256sum test. Copying the same DVD iso file via scp instead
worked fine. I do not trust NFS for large files.

--Greg



Hi Greg

That's interesting and very worrying - surely it can't/shouldn't happen!

I have been using NFS for years for all types/sizes of files and
never had a problem until the last couple of months.

1.  The Centos/RHEL 5.3/5.4 kernel had a serious bug that has been fixed with 
the
latest kernel update

2.  Now this F12 problem

Surely a very large worldwide community uses NFS ?

OK the F12 case could be my finger trouble or even a hardware problem

I will install F12 on a second machine and test again (against the same server)

Can you verify that you run into the same issue if you run NFS over TCP
as opposed to NFS over UDP (it's an option in the mount command on the
client, use either proto=tcp or proto=udp).

By default, the system queries the server and selects a protocol based
on what's being asked of it.  See the TRANSPORT METHODS section of
man nfs.
--
- Rick Stevens, Systems Engineer  ri...@nerd.com -
- AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 -
--
-   The Theory of Rapitivity: E=MC Hammer-
-  -- Glenn Marcus (via TopFive.com) -
--



Hi Rick

Many thanks for the reply - you have found a work-around !!

Just tested my machine with UDP and TCP
This was using md5sum for about 10GB over the NFS mount

1. The default for F12/Centos5.4 appears to be TCP - which freezes
2. Forcing UDP gives NO errors for 10GB transfer
3. Forcing TCP gives a freeze

Having briefly read the man pages this is the opposite of what I would
expect and of what you suggest !!

There must be a timing problem somewhere - 


Please see the other thread Sky2 NIC Problem? - Was F12 NFS Failures
for other tests I have carried out

Regards

John






what are your other mount options?
having seen the Sky2 NIC Problem message, your card/driver may be having issues, but some nfs 
options may help/hurt.


I am assuming that you only have 'hard' and not 'hard,intr' as options to the 
mount.
And for transferring large files over NFS, I have had experiences that say stay 
away from 'soft' NFS.

it is interesting that TCP nfs locks the machine and fails to copy the very large file, while UDP 
succeeds in copying the same file with the same device/drver. BTW when you say that UDP gave no 
errors, do you mean that from the user program perspective (cp, and then sha256sum) there were no 
errors, or that from both the user and syslog perspective there were no errors? I am wondering if 
you have found a place where the UDP code deals with a bad packet correctly and the TCP version has 
not seen enough (bad environment) testing. Wouldn't happen to have a serial cable around so you can 
capture where the kernel goes bonkers at would you? (note, never done the serial console myself.)


--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: F12 NFS Failures

2009-12-01 Thread John Austin
On Tue, 2009-12-01 at 11:00 -0500, Todd Denniston wrote:
 John Austin wrote, On Tue, 24 Nov 2009 12:21:58 +:
  On Mon, 2009-11-23 at 15:00 -0800, Rick Stevens wrote:
  On 11/21/2009 10:41 AM, John Austin wrote:
  On Sat, 2009-11-21 at 11:11 -0700, Greg Woods wrote:
  On Sat, 2009-11-21 at 10:09 +, John Austin wrote:
 
  When copying a large file (2.7GB) from the server to the
  F12 m/c a complete freeze of the F12 machine occurs.
 
  I haven't seen freezes, but I have seen corruption when trying to copy
  large files (e.g. like a DVD iso image) via NFS. In fact, this happened
  to me when I was trying to install an F12 virtual machine on my F11 box
  (so I could try it out before deciding whether or not to bite the bullet
  and upgrade the host OS). I copied over the DVD iso image, then tried to
  install a VM from it, and it failed the media test. Sure enough, it also
  failed the sha256sum test. Copying the same DVD iso file via scp instead
  worked fine. I do not trust NFS for large files.
 
  --Greg
 
 
  Hi Greg
 
  That's interesting and very worrying - surely it can't/shouldn't happen!
 
  I have been using NFS for years for all types/sizes of files and
  never had a problem until the last couple of months.
 
  1.  The Centos/RHEL 5.3/5.4 kernel had a serious bug that has been fixed 
  with the
latest kernel update
 
  2.  Now this F12 problem
 
  Surely a very large worldwide community uses NFS ?
 
  OK the F12 case could be my finger trouble or even a hardware problem
 
  I will install F12 on a second machine and test again (against the same 
  server)
  Can you verify that you run into the same issue if you run NFS over TCP
  as opposed to NFS over UDP (it's an option in the mount command on the
  client, use either proto=tcp or proto=udp).
 
  By default, the system queries the server and selects a protocol based
  on what's being asked of it.  See the TRANSPORT METHODS section of
  man nfs.
  --
  - Rick Stevens, Systems Engineer  ri...@nerd.com -
  - AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 -
  --
  -   The Theory of Rapitivity: E=MC Hammer-
  -  -- Glenn Marcus (via TopFive.com) -
  --
  
  
  Hi Rick
  
  Many thanks for the reply - you have found a work-around !!
  
  Just tested my machine with UDP and TCP
  This was using md5sum for about 10GB over the NFS mount
  
  1. The default for F12/Centos5.4 appears to be TCP - which freezes
  2. Forcing UDP gives NO errors for 10GB transfer
  3. Forcing TCP gives a freeze
  
  Having briefly read the man pages this is the opposite of what I would
  expect and of what you suggest !!
  
  There must be a timing problem somewhere - 
  
  Please see the other thread Sky2 NIC Problem? - Was F12 NFS Failures
  for other tests I have carried out
  
  Regards
  
  John
  
  
  
  
 
 what are your other mount options?
 having seen the Sky2 NIC Problem message, your card/driver may be having 
 issues, but some nfs 
 options may help/hurt.
 
 I am assuming that you only have 'hard' and not 'hard,intr' as options to the 
 mount.
 And for transferring large files over NFS, I have had experiences that say 
 stay away from 'soft' NFS.
 
 it is interesting that TCP nfs locks the machine and fails to copy the very 
 large file, while UDP 
 succeeds in copying the same file with the same device/drver. BTW when you 
 say that UDP gave no 
 errors, do you mean that from the user program perspective (cp, and then 
 sha256sum) there were no 
 errors, or that from both the user and syslog perspective there were no 
 errors?

Purely from the user point of view, I did not check the number of
re-transmission, log files etc.

 I am wondering if 
 you have found a place where the UDP code deals with a bad packet correctly 
 and the TCP version has 
 not seen enough (bad environment) testing.


 Wouldn't happen to have a serial cable around so you can 
 capture where the kernel goes bonkers at would you? (note, never done the 
 serial console myself.)
 
I've probably got a serial cable in the roof somewhere but the machine
has no serial ports! Shuttle SA76G2.

Hi Todd

I must admit that I have basically given up with the sky2 driver for the
moment.

I gave up after reading about problems with the sky2 driver way back to
something like 2.6.18.

I had a spare D-Link gigabit NIC and have been using that.

My whole network depends on NFS working perfectly so a dodgy driver is
no use to me.

It must be a very subtle bug as I cannot cause the freeze with
1. scp 10GB across the network
2. md5sum across a CIFS samba mount
3. md5sum across NFS4 UDP

Maybe you are right and it would fail if I tried harder/longer

Regards

John


  





-- 
fedora-list mailing list

Re: F12 NFS Failures

2009-11-24 Thread John Austin
On Mon, 2009-11-23 at 15:00 -0800, Rick Stevens wrote:
 On 11/21/2009 10:41 AM, John Austin wrote:
  On Sat, 2009-11-21 at 11:11 -0700, Greg Woods wrote:
  On Sat, 2009-11-21 at 10:09 +, John Austin wrote:
 
 
  When copying a large file (2.7GB) from the server to the
  F12 m/c a complete freeze of the F12 machine occurs.
 
 
  I haven't seen freezes, but I have seen corruption when trying to copy
  large files (e.g. like a DVD iso image) via NFS. In fact, this happened
  to me when I was trying to install an F12 virtual machine on my F11 box
  (so I could try it out before deciding whether or not to bite the bullet
  and upgrade the host OS). I copied over the DVD iso image, then tried to
  install a VM from it, and it failed the media test. Sure enough, it also
  failed the sha256sum test. Copying the same DVD iso file via scp instead
  worked fine. I do not trust NFS for large files.
 
  --Greg
 
 
 
  Hi Greg
 
  That's interesting and very worrying - surely it can't/shouldn't happen!
 
  I have been using NFS for years for all types/sizes of files and
  never had a problem until the last couple of months.
 
  1.  The Centos/RHEL 5.3/5.4 kernel had a serious bug that has been fixed 
  with the
  latest kernel update
 
  2.  Now this F12 problem
 
  Surely a very large worldwide community uses NFS ?
 
  OK the F12 case could be my finger trouble or even a hardware problem
 
  I will install F12 on a second machine and test again (against the same 
  server)
 
 Can you verify that you run into the same issue if you run NFS over TCP
 as opposed to NFS over UDP (it's an option in the mount command on the
 client, use either proto=tcp or proto=udp).
 
 By default, the system queries the server and selects a protocol based
 on what's being asked of it.  See the TRANSPORT METHODS section of
 man nfs.
 --
 - Rick Stevens, Systems Engineer  ri...@nerd.com -
 - AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 -
 --
 -   The Theory of Rapitivity: E=MC Hammer-
 -  -- Glenn Marcus (via TopFive.com) -
 --


Hi Rick

Many thanks for the reply - you have found a work-around !!

Just tested my machine with UDP and TCP
This was using md5sum for about 10GB over the NFS mount

1. The default for F12/Centos5.4 appears to be TCP - which freezes
2. Forcing UDP gives NO errors for 10GB transfer
3. Forcing TCP gives a freeze

Having briefly read the man pages this is the opposite of what I would
expect and of what you suggest !!

There must be a timing problem somewhere - 

Please see the other thread Sky2 NIC Problem? - Was F12 NFS Failures
for other tests I have carried out

Regards

John



-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: F12 NFS Failures

2009-11-23 Thread Rick Stevens

On 11/21/2009 10:41 AM, John Austin wrote:

On Sat, 2009-11-21 at 11:11 -0700, Greg Woods wrote:

On Sat, 2009-11-21 at 10:09 +, John Austin wrote:



When copying a large file (2.7GB) from the server to the
F12 m/c a complete freeze of the F12 machine occurs.



I haven't seen freezes, but I have seen corruption when trying to copy
large files (e.g. like a DVD iso image) via NFS. In fact, this happened
to me when I was trying to install an F12 virtual machine on my F11 box
(so I could try it out before deciding whether or not to bite the bullet
and upgrade the host OS). I copied over the DVD iso image, then tried to
install a VM from it, and it failed the media test. Sure enough, it also
failed the sha256sum test. Copying the same DVD iso file via scp instead
worked fine. I do not trust NFS for large files.

--Greg




Hi Greg

That's interesting and very worrying - surely it can't/shouldn't happen!

I have been using NFS for years for all types/sizes of files and
never had a problem until the last couple of months.

1.  The Centos/RHEL 5.3/5.4 kernel had a serious bug that has been fixed with 
the
latest kernel update

2.  Now this F12 problem

Surely a very large worldwide community uses NFS ?

OK the F12 case could be my finger trouble or even a hardware problem

I will install F12 on a second machine and test again (against the same server)


Can you verify that you run into the same issue if you run NFS over TCP
as opposed to NFS over UDP (it's an option in the mount command on the
client, use either proto=tcp or proto=udp).

By default, the system queries the server and selects a protocol based
on what's being asked of it.  See the TRANSPORT METHODS section of
man nfs.
--
- Rick Stevens, Systems Engineer  ri...@nerd.com -
- AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 -
--
-   The Theory of Rapitivity: E=MC Hammer-
-  -- Glenn Marcus (via TopFive.com) -
--

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: F12 NFS Failures

2009-11-21 Thread Antonio Olivares


--- On Sat, 11/21/09, John Austin j...@jaa.org.uk wrote:

 From: John Austin j...@jaa.org.uk
 Subject: F12 NFS Failures
 To: fedora-list@redhat.com
 Date: Saturday, November 21, 2009, 2:09 AM
 Hi
 
 I have just completed a clean install of F12 and
 subsequent yum update on a client machine.
 NFS was used for the install - no problems !!
 I am using a fully updated Centos 5.4 nfs server
 
 When copying a large file (2.7GB) from the server to the
 F12 m/c a complete freeze of the F12 machine occurs.
 No mouse, keyboard, ssh login. 
 Only hitting the Reset button gets it back.
 
 F12 is installed on the only disk on the machine which has
 several ext3
 partitions. A fully updated F11 is on one of the
 partitions
 
 I have tried
 1. Changing from NFS4 to NFS3 - Still locks up
 2. scp the same file from the server to F12 no problem 
 3. md5sum on the file across the nfs mount - a read only? -
 F12 freezes
 4. Booting the F11 partition and copying the same file - no
 problems
 5. Tried playing with Defaultvers=4 in /etc/nfsmount.conf -
 still locks
 
 I have googled but not found anything useful so far
 
 My understanding is that NFS code is in the kernel - is
 that correct?
 
 Has anyone seen this or has any ideas about the next move

1) before doing anything, check the status of NFS, i.e, 

# service NFS status

2), NFS is failing because something is not letting it run correctly.  I saw it 
in testing Fedora 12 rawhide days, on messages(bootup), so it could be that the 
service is not running? and something is stopping it from working properly?  

 
 Regards
 
 John
 
 
 
 
 -- 
Regards,

Antonio 


  

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: F12 NFS Failures

2009-11-21 Thread John Austin
On Sat, 2009-11-21 at 06:33 -0800, Antonio Olivares wrote:
 
 --- On Sat, 11/21/09, John Austin j...@jaa.org.uk wrote:
 
  From: John Austin j...@jaa.org.uk
  Subject: F12 NFS Failures
  To: fedora-list@redhat.com
  Date: Saturday, November 21, 2009, 2:09 AM
  Hi
  
  I have just completed a clean install of F12 and
  subsequent yum update on a client machine.
  NFS was used for the install - no problems !!
  I am using a fully updated Centos 5.4 nfs server
  
  When copying a large file (2.7GB) from the server to the
  F12 m/c a complete freeze of the F12 machine occurs.
  No mouse, keyboard, ssh login. 
  Only hitting the Reset button gets it back.
  
  F12 is installed on the only disk on the machine which has
  several ext3
  partitions. A fully updated F11 is on one of the
  partitions
  
  I have tried
  1. Changing from NFS4 to NFS3 - Still locks up
  2. scp the same file from the server to F12 no problem 
  3. md5sum on the file across the nfs mount - a read only? -
  F12 freezes
  4. Booting the F11 partition and copying the same file - no
  problems
  5. Tried playing with Defaultvers=4 in /etc/nfsmount.conf -
  still locks
  
  I have googled but not found anything useful so far
  
  My understanding is that NFS code is in the kernel - is
  that correct?
  
  Has anyone seen this or has any ideas about the next move
 
 1) before doing anything, check the status of NFS, i.e, 
 
 # service NFS status
 
 2), NFS is failing because something is not letting it run correctly.  I saw 
 it in testing Fedora 12 rawhide days, on messages(bootup),

 so it could be that the service is not running? and something is stopping it 
 from working properly?  
 Regards,
 
 Antonio 

Hi Antonio

Thanks for the reply

NFS is definitely running to some extent as home directories are mounted OK
and my global directory is also mounted OK.
The client only seems to fail during a large/long transfer

The autofs (NIS exported) files of interest are

maui.jaa.org.uk ~ 1# cat /etc/auto.home
#*  -fstype=nfs 148.197.29.5:/exports/home/
*   -fstype=nfs4,rsize=32768,wsize=32768148.197.29.5:/home/

maui.jaa.org.uk ~ 2# cat /etc/auto.direct
#/global-fstype=nfs  148.197.29.5:/exports/global
/global-fstype=nfs4,rsize=32768,wsize=32768  148.197.29.5:/global

The client locks up with no indication of a problem in /var/log/messages
after a restart

The server shows
[r...@maui ~]# cat /var/log/messages |grep nfs
...
Nov 20 16:25:48 maui kernel: nfs4_cb: server 148.197.29.252 not responding, 
timed out
...

The client falls over at random times during a transfer and leaves a partially
copied file when using cp

I did wonder whether it was something to do with FS-Cache but
as far as I can see nfs is not using it.
dmesg includes
FS-Cache: Loaded
FS-Cache: Netfs 'nfs' registered for caching

but this shows no activity
cat /proc/fs/fscache/stats

John


-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: F12 NFS Failures

2009-11-21 Thread Greg Woods
On Sat, 2009-11-21 at 10:09 +, John Austin wrote:

 
 When copying a large file (2.7GB) from the server to the
 F12 m/c a complete freeze of the F12 machine occurs.


I haven't seen freezes, but I have seen corruption when trying to copy
large files (e.g. like a DVD iso image) via NFS. In fact, this happened
to me when I was trying to install an F12 virtual machine on my F11 box
(so I could try it out before deciding whether or not to bite the bullet
and upgrade the host OS). I copied over the DVD iso image, then tried to
install a VM from it, and it failed the media test. Sure enough, it also
failed the sha256sum test. Copying the same DVD iso file via scp instead
worked fine. I do not trust NFS for large files.

--Greg


-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines