Re: [CentOS] NFS issues

2008-09-24 Thread Akemi Yagi
On Thu, Sep 4, 2008 at 8:09 AM, Akemi Yagi [EMAIL PROTECTED] wrote:
 On Thu, Sep 4, 2008 at 7:35 AM, Akemi Yagi [EMAIL PROTECTED] wrote:

 CentOS developer, Tru, compiled a patched version of regular kernel
 and is offering it at:

 http://people.centos.org/tru/kernel+bz453094/

 Also, the fix will be in the upcoming kernel-2.6.18-92.1.13.el5
 according to the bugzilla referred to above.

 The bugzilla link is actually this one:

 https://bugzilla.redhat.com/show_bug.cgi?id=459083

 Akemi

kernel-2.6.18-92.1.13.el5 is out (upstream):

http://rhn.redhat.com/errata/RHSA-2008-0885.html

Akemi
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-09-24 Thread Craig White
On Wed, 2008-09-24 at 13:38 -0700, Akemi Yagi wrote:
 On Thu, Sep 4, 2008 at 8:09 AM, Akemi Yagi [EMAIL PROTECTED] wrote:
  On Thu, Sep 4, 2008 at 7:35 AM, Akemi Yagi [EMAIL PROTECTED] wrote:
 
  CentOS developer, Tru, compiled a patched version of regular kernel
  and is offering it at:
 
  http://people.centos.org/tru/kernel+bz453094/
 
  Also, the fix will be in the upcoming kernel-2.6.18-92.1.13.el5
  according to the bugzilla referred to above.
 
  The bugzilla link is actually this one:
 
  https://bugzilla.redhat.com/show_bug.cgi?id=459083
 
  Akemi
 
 kernel-2.6.18-92.1.13.el5 is out (upstream):
 
 http://rhn.redhat.com/errata/RHSA-2008-0885.html
 

yep and I'm still running an old kernel to get around this - got the
notification from bugzilla today myself - hooray

Craig

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-09-04 Thread Akemi Yagi
On Thu, Sep 4, 2008 at 7:35 AM, Akemi Yagi [EMAIL PROTECTED] wrote:

 CentOS developer, Tru, compiled a patched version of regular kernel
 and is offering it at:

 http://people.centos.org/tru/kernel+bz453094/

 Also, the fix will be in the upcoming kernel-2.6.18-92.1.13.el5
 according to the bugzilla referred to above.

The bugzilla link is actually this one:

https://bugzilla.redhat.com/show_bug.cgi?id=459083

Akemi
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-14 Thread Filipe Brandenburger
On Wed, Aug 13, 2008 at 09:48, Johan Swensson
[EMAIL PROTECTED] wrote:
 I was also thinking about mounting the nfs shares as soft, is this a good
 idea?

No, this is a bad idea. Mounting as soft means that if there is any
errors or timeouts, your writes will fail, and most programs don't
check for the status of those, so you will have undetectable data
loss.

 And also, what's the difference between soft and intr?

Intr (which is a good idea) means that you can use kill to stop
processes that are hung waiting for the NFS server. The problem with
intr is that I never saw it working. When my NFS server goes down,
the processes that are waiting for it will stay in D state, no
matter if I try to kill or even kill -9 them... So, although
intr seems like a good idea, in practice it does not make much of a
difference.

HTH,
Filipe
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-13 Thread andylockran
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Not wanting to hijack the thread, but since a similar date I've had
issues with NFS updates being 'delayed' for anything between two seconds
 to six hours.

Weird one.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIor/hauMjEM4rxIQRAiefAKCicF3Y2WDNMBonO9QSuFMzDmCKYwCeNMkb
6yrbg0Ytt6ceDG6m3iTA030=
=Eaq9
-END PGP SIGNATURE-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-13 Thread nate
Johan Swensson wrote:
 No firewall on either end and server responds to ping.

 client:
program vers proto   port
 102   tcp111  portmapper
 102   udp111  portmapper
 1000241   udp889  status
 1000241   tcp892  status

Doesn't look like nfslock is running on the client?

What does /etc/init.d/nfslock status say?

 As Craig said he started notice this about the the time he upgraded to
 5.2, the same goes for me, started getting this problem about the time
 I've upgraded the clients and server.

Maybe related to this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=453094

Try restarting nfslock on both client and server when it occurs?
Or try setting up a cron to restart nfslock hourly on all systems
to see if that can workaround the issue until a fix comes out?

nate




___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-13 Thread Johan Swensson

nate wrote:

Johan Swensson wrote:
  

No firewall on either end and server responds to ping.

client:
   program vers proto   port
102   tcp111  portmapper
102   udp111  portmapper
1000241   udp889  status
1000241   tcp892  status



Doesn't look like nfslock is running on the client?

What does /etc/init.d/nfslock status say?

  

[EMAIL PROTECTED] ~]# service nfslock status
rpc.statd (pid 2737) is running...


As Craig said he started notice this about the the time he upgraded to
5.2, the same goes for me, started getting this problem about the time
I've upgraded the clients and server.



Maybe related to this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=453094

Try restarting nfslock on both client and server when it occurs?
Or try setting up a cron to restart nfslock hourly on all systems
to see if that can workaround the issue until a fix comes out?

nate


  
Actually I tried restarting both nfslock(on clients and server) and 
nfs(on server) but it didn't help.

Is my solution with mounting it nolock bad?

I was also thinking about mounting the nfs shares as soft, is this a 
good idea? Could it help me? And also, what's the difference between 
soft and intr?

Read the manual and I thought they were pretty similiar.



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
  



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-13 Thread Matthew Kent
On Tue, 2008-08-12 at 14:27 +0200, Johan Swensson wrote:
 So I'm running nfs to get content to my web servers. Now I've had this
 problem 2 times (about 2 weeks since the last occurrence).
 I use drbd on the nfs server for redundancy. Now to my problem:
 
 All my web sites stopped responding so I started by checking dmesg and
 there I found a bunch of this errors
 Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, 
 timed out
 Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, 
 timed out
 
 But when checking the nfs server lockd was running and I could access
 all the files from the webserver with ls, cd etc.

This is the exact problem we were having here. Rebooting is the only
solution.

And as already mentioned further down the thread it was attributed to
this https://bugzilla.redhat.com/show_bug.cgi?id=453094

My solution was to extract the patch from the upstream kernel in 
http://people.redhat.com/dzickus/el5/103.el5/src/
called
linux-2.6-fs-lockd-nlmsvc_lookup_host-called-with-f_sema-held.patch

and reroll the latest centosplus kernel srpm with it. Servers have been
fine for 6 days running this kernel.

As much as I hate carrying custom kernel rpms this is a showstopper for
us, and it looks like it won't make in until 5.3. 

Personally given the limited scope of the patch and apparent
unwillingness of redhat to include it in an update I'd advocate CentOS
carrying it as a custom patch.

Here's my srpm if anyone wants it, 
http://magoazul.com/tmp/kernel-2.6.18-92.1.10.1.el5.centos.plus.src.rpm
the only change is the patch for this issue. Everything builds cleanly
via mock. 
-- 
Matthew Kent \ SA \ bravenet.com

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] NFS issues

2008-08-12 Thread Johan Swensson
So I'm running nfs to get content to my web servers. Now I've had this 
problem 2 times (about 2 weeks since the last occurrence).

I use drbd on the nfs server for redundancy. Now to my problem:

All my web sites stopped responding so I started by checking dmesg and 
there I found a bunch of this errors

||

Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed 
out
Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed 
out


But when checking the nfs server lockd was running and I could access 
all the files from the webserver with ls, cd etc.


The logs on the nfs server doesn't say anything of interest and checking 
apaches error_log just says not found or unable to stat.


Now I mentioned this have happened 2 times and both these times I've 
solved it by rebooting the nfs server and web servers. This isn't a 
good solution to have to reboot my servers every couple of weeks so I 
really could use some help. :)


Also I get this from time to time on the web servers, dunno if it's related.
/do_vfs_lock: VFS is out of sync with lock manager! /
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-12 Thread Johan Swensson
It happend again this night but now I temporarily(?) fixed it with 
mounting -o nolock on the web servers.
It works but dmesg is still spamming lockd: server 192.168.20.22 not 
responding, timed out. Atleast my sites are up, and the message isn't 
critical anymore.

But how can I get rid of it?

Johan Swensson wrote:
So I'm running nfs to get content to my web servers. Now I've had this 
problem 2 times (about 2 weeks since the last occurrence).

I use drbd on the nfs server for redundancy. Now to my problem:

All my web sites stopped responding so I started by checking dmesg and 
there I found a bunch of this errors

||
Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed 
out
Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed 
out

But when checking the nfs server lockd was running and I could access 
all the files from the webserver with ls, cd etc.


The logs on the nfs server doesn't say anything of interest and 
checking apaches error_log just says not found or unable to stat.


Now I mentioned this have happened 2 times and both these times I've 
solved it by rebooting the nfs server and web servers. This isn't a 
good solution to have to reboot my servers every couple of weeks so I 
really could use some help. :)


Also I get this from time to time on the web servers, dunno if it's 
related.

/do_vfs_lock: VFS is out of sync with lock manager! /


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
  


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-12 Thread Craig White
On Tue, 2008-08-12 at 14:27 +0200, Johan Swensson wrote:
 So I'm running nfs to get content to my web servers. Now I've had this
 problem 2 times (about 2 weeks since the last occurrence).
 I use drbd on the nfs server for redundancy. Now to my problem:
 
 All my web sites stopped responding so I started by checking dmesg and
 there I found a bunch of this errors
 Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, 
 timed out
 Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, 
 timed out
 
 But when checking the nfs server lockd was running and I could access
 all the files from the webserver with ls, cd etc.
 
 The logs on the nfs server doesn't say anything of interest and
 checking apaches error_log just says not found or unable to stat.
 
 Now I mentioned this have happened 2 times and both these times I've
 solved it by rebooting the nfs server and web servers. This isn't a
 good solution to have to reboot my servers every couple of weeks so I
 really could use some help. :)
 
 Also I get this from time to time on the web servers, dunno if it's
 related.
 do_vfs_lock: VFS is out of sync with lock manager!

I too have been having the same issues with my nfs server - which seems
to have started when I updated on July 27th (5.2)

It seems to happen after logrotate on Sunday morning but I didn't know
about it until users show up on Monday mornings.

/var/log/messages has...

Aug  4 09:32:59 cube kernel: lockd: server HOSTNAME not responding,
still trying

and like you, I've rebooted the main server each time (Monday
mornings)...there's something wrong that I can't figure out

Craig

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-12 Thread nate
Johan Swensson wrote:
 It happend again this night but now I temporarily(?) fixed it with
 mounting -o nolock on the web servers.
 It works but dmesg is still spamming lockd: server 192.168.20.22 not
 responding, timed out. Atleast my sites are up, and the message isn't
 critical anymore.
 But how can I get rid of it?

What does 'rpcinfo -p' read on both the servers and the clients?

Also how about /etc/init.d/nfs status (both client and server)
and /etc/init.d/nfslock status (both client and server)

Any firewalls in between client and server?
Run: iptables -L -n (on both client and server)

nate

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-12 Thread Craig White
On Tue, 2008-08-12 at 20:16 -0700, nate wrote:
 Johan Swensson wrote:
  It happend again this night but now I temporarily(?) fixed it with
  mounting -o nolock on the web servers.
  It works but dmesg is still spamming lockd: server 192.168.20.22 not
  responding, timed out. Atleast my sites are up, and the message isn't
  critical anymore.
  But how can I get rid of it?
 
 What does 'rpcinfo -p' read on both the servers and the clients?
 
 Also how about /etc/init.d/nfs status (both client and server)
 and /etc/init.d/nfslock status (both client and server)
 
 Any firewalls in between client and server?
 Run: iptables -L -n (on both client and server)

I don't want to step on Johan's thread but wanted to commiserate with
him.

No firewall's at present...
nfs and nfslock on both client and server are running and show pid's

client
[EMAIL PROTECTED] ~]# rpcinfo -p   
   program vers proto   port  service   
104   tcp111  portmapper
103   tcp111  portmapper
102   tcp111  portmapper
104   udp111  portmapper
103   udp111  portmapper
102   udp111  portmapper
104 0111  portmapper
103 0111  portmapper
102 0111  portmapper
1000241   udp  50259  status
1000241   tcp  53710  status
1000211   tcp  53045  nlockmgr  
1000213   tcp  53045  nlockmgr  
1000214   tcp  53045  nlockmgr  

server
[EMAIL PROTECTED] log]# rpcinfo -p
   program vers proto   port
102   tcp111  portmapper
102   udp111  portmapper
1000241   udp   4003  status
1000241   tcp   4003  status
1000111   udp   4000  rquotad
1000112   udp   4000  rquotad
1000111   tcp   4000  rquotad
1000112   tcp   4000  rquotad
132   udp   2049  nfs
133   udp   2049  nfs
134   udp   2049  nfs
1000211   udp   4001  nlockmgr
1000213   udp   4001  nlockmgr
1000214   udp   4001  nlockmgr
1000211   tcp   4001  nlockmgr
1000213   tcp   4001  nlockmgr
1000214   tcp   4001  nlockmgr
132   tcp   2049  nfs
133   tcp   2049  nfs
134   tcp   2049  nfs
151   udp   4002  mountd
151   tcp   4002  mountd
152   udp   4002  mountd
152   tcp   4002  mountd
153   udp   4002  mountd
153   tcp   4002  mountd

Server has ports fixed in place with settings in /etc/sysconfig/nfs

Craig

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] NFS issues

2008-08-12 Thread Johan Swensson

No firewall on either end and server responds to ping.

client:
  program vers proto   port
   102   tcp111  portmapper
   102   udp111  portmapper
   1000241   udp889  status
   1000241   tcp892  status
server:

  program vers proto   port
   102   tcp111  portmapper
   102   udp111  portmapper
   1000241   udp964  status
   1000241   tcp967  status
   1000111   udp718  rquotad
   1000112   udp718  rquotad
   1000111   tcp721  rquotad
   1000112   tcp721  rquotad
   132   udp   2049  nfs
   133   udp   2049  nfs
   134   udp   2049  nfs
   1000211   udp  32768  nlockmgr
   1000213   udp  32768  nlockmgr
   1000214   udp  32768  nlockmgr
   132   tcp   2049  nfs
   133   tcp   2049  nfs
   134   tcp   2049  nfs
   1000211   tcp  58027  nlockmgr
   1000213   tcp  58027  nlockmgr
   1000214   tcp  58027  nlockmgr
   151   udp778  mountd
   151   tcp781  mountd
   152   udp778  mountd
   152   tcp781  mountd
   153   udp778  mountd
   153   tcp781  mountd

However I just rebooted the nfs server. But when I checked before lockd 
was running with a ps -A
As Craig said he started notice this about the the time he upgraded to 
5.2, the same goes for me, started getting this problem about the time 
I've upgraded the clients and server.

nate wrote:

Johan Swensson wrote:
  

It happend again this night but now I temporarily(?) fixed it with
mounting -o nolock on the web servers.
It works but dmesg is still spamming lockd: server 192.168.20.22 not
responding, timed out. Atleast my sites are up, and the message isn't
critical anymore.
But how can I get rid of it?



What does 'rpcinfo -p' read on both the servers and the clients?

Also how about /etc/init.d/nfs status (both client and server)
and /etc/init.d/nfslock status (both client and server)

Any firewalls in between client and server?
Run: iptables -L -n (on both client and server)

nate

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
  



--

*Johan Swensson | apegroup*
System Administrator
[EMAIL PROTECTED]
Mobile: +46 (0) 735 21 98 58
www.apegroup.com
Fiskartorpsvägen 52, 115 42 Stockholm
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos