Re: [CentOS] NFS issues
On Thu, Sep 4, 2008 at 8:09 AM, Akemi Yagi [EMAIL PROTECTED] wrote: On Thu, Sep 4, 2008 at 7:35 AM, Akemi Yagi [EMAIL PROTECTED] wrote: CentOS developer, Tru, compiled a patched version of regular kernel and is offering it at: http://people.centos.org/tru/kernel+bz453094/ Also, the fix will be in the upcoming kernel-2.6.18-92.1.13.el5 according to the bugzilla referred to above. The bugzilla link is actually this one: https://bugzilla.redhat.com/show_bug.cgi?id=459083 Akemi kernel-2.6.18-92.1.13.el5 is out (upstream): http://rhn.redhat.com/errata/RHSA-2008-0885.html Akemi ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
On Wed, 2008-09-24 at 13:38 -0700, Akemi Yagi wrote: On Thu, Sep 4, 2008 at 8:09 AM, Akemi Yagi [EMAIL PROTECTED] wrote: On Thu, Sep 4, 2008 at 7:35 AM, Akemi Yagi [EMAIL PROTECTED] wrote: CentOS developer, Tru, compiled a patched version of regular kernel and is offering it at: http://people.centos.org/tru/kernel+bz453094/ Also, the fix will be in the upcoming kernel-2.6.18-92.1.13.el5 according to the bugzilla referred to above. The bugzilla link is actually this one: https://bugzilla.redhat.com/show_bug.cgi?id=459083 Akemi kernel-2.6.18-92.1.13.el5 is out (upstream): http://rhn.redhat.com/errata/RHSA-2008-0885.html yep and I'm still running an old kernel to get around this - got the notification from bugzilla today myself - hooray Craig ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
On Thu, Sep 4, 2008 at 7:35 AM, Akemi Yagi [EMAIL PROTECTED] wrote: CentOS developer, Tru, compiled a patched version of regular kernel and is offering it at: http://people.centos.org/tru/kernel+bz453094/ Also, the fix will be in the upcoming kernel-2.6.18-92.1.13.el5 according to the bugzilla referred to above. The bugzilla link is actually this one: https://bugzilla.redhat.com/show_bug.cgi?id=459083 Akemi ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
On Wed, Aug 13, 2008 at 09:48, Johan Swensson [EMAIL PROTECTED] wrote: I was also thinking about mounting the nfs shares as soft, is this a good idea? No, this is a bad idea. Mounting as soft means that if there is any errors or timeouts, your writes will fail, and most programs don't check for the status of those, so you will have undetectable data loss. And also, what's the difference between soft and intr? Intr (which is a good idea) means that you can use kill to stop processes that are hung waiting for the NFS server. The problem with intr is that I never saw it working. When my NFS server goes down, the processes that are waiting for it will stay in D state, no matter if I try to kill or even kill -9 them... So, although intr seems like a good idea, in practice it does not make much of a difference. HTH, Filipe ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Not wanting to hijack the thread, but since a similar date I've had issues with NFS updates being 'delayed' for anything between two seconds to six hours. Weird one. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFIor/hauMjEM4rxIQRAiefAKCicF3Y2WDNMBonO9QSuFMzDmCKYwCeNMkb 6yrbg0Ytt6ceDG6m3iTA030= =Eaq9 -END PGP SIGNATURE- ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
Johan Swensson wrote: No firewall on either end and server responds to ping. client: program vers proto port 102 tcp111 portmapper 102 udp111 portmapper 1000241 udp889 status 1000241 tcp892 status Doesn't look like nfslock is running on the client? What does /etc/init.d/nfslock status say? As Craig said he started notice this about the the time he upgraded to 5.2, the same goes for me, started getting this problem about the time I've upgraded the clients and server. Maybe related to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=453094 Try restarting nfslock on both client and server when it occurs? Or try setting up a cron to restart nfslock hourly on all systems to see if that can workaround the issue until a fix comes out? nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
nate wrote: Johan Swensson wrote: No firewall on either end and server responds to ping. client: program vers proto port 102 tcp111 portmapper 102 udp111 portmapper 1000241 udp889 status 1000241 tcp892 status Doesn't look like nfslock is running on the client? What does /etc/init.d/nfslock status say? [EMAIL PROTECTED] ~]# service nfslock status rpc.statd (pid 2737) is running... As Craig said he started notice this about the the time he upgraded to 5.2, the same goes for me, started getting this problem about the time I've upgraded the clients and server. Maybe related to this bug: https://bugzilla.redhat.com/show_bug.cgi?id=453094 Try restarting nfslock on both client and server when it occurs? Or try setting up a cron to restart nfslock hourly on all systems to see if that can workaround the issue until a fix comes out? nate Actually I tried restarting both nfslock(on clients and server) and nfs(on server) but it didn't help. Is my solution with mounting it nolock bad? I was also thinking about mounting the nfs shares as soft, is this a good idea? Could it help me? And also, what's the difference between soft and intr? Read the manual and I thought they were pretty similiar. ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
On Tue, 2008-08-12 at 14:27 +0200, Johan Swensson wrote: So I'm running nfs to get content to my web servers. Now I've had this problem 2 times (about 2 weeks since the last occurrence). I use drbd on the nfs server for redundancy. Now to my problem: All my web sites stopped responding so I started by checking dmesg and there I found a bunch of this errors Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out But when checking the nfs server lockd was running and I could access all the files from the webserver with ls, cd etc. This is the exact problem we were having here. Rebooting is the only solution. And as already mentioned further down the thread it was attributed to this https://bugzilla.redhat.com/show_bug.cgi?id=453094 My solution was to extract the patch from the upstream kernel in http://people.redhat.com/dzickus/el5/103.el5/src/ called linux-2.6-fs-lockd-nlmsvc_lookup_host-called-with-f_sema-held.patch and reroll the latest centosplus kernel srpm with it. Servers have been fine for 6 days running this kernel. As much as I hate carrying custom kernel rpms this is a showstopper for us, and it looks like it won't make in until 5.3. Personally given the limited scope of the patch and apparent unwillingness of redhat to include it in an update I'd advocate CentOS carrying it as a custom patch. Here's my srpm if anyone wants it, http://magoazul.com/tmp/kernel-2.6.18-92.1.10.1.el5.centos.plus.src.rpm the only change is the patch for this issue. Everything builds cleanly via mock. -- Matthew Kent \ SA \ bravenet.com ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] NFS issues
So I'm running nfs to get content to my web servers. Now I've had this problem 2 times (about 2 weeks since the last occurrence). I use drbd on the nfs server for redundancy. Now to my problem: All my web sites stopped responding so I started by checking dmesg and there I found a bunch of this errors || Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out But when checking the nfs server lockd was running and I could access all the files from the webserver with ls, cd etc. The logs on the nfs server doesn't say anything of interest and checking apaches error_log just says not found or unable to stat. Now I mentioned this have happened 2 times and both these times I've solved it by rebooting the nfs server and web servers. This isn't a good solution to have to reboot my servers every couple of weeks so I really could use some help. :) Also I get this from time to time on the web servers, dunno if it's related. /do_vfs_lock: VFS is out of sync with lock manager! / ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
It happend again this night but now I temporarily(?) fixed it with mounting -o nolock on the web servers. It works but dmesg is still spamming lockd: server 192.168.20.22 not responding, timed out. Atleast my sites are up, and the message isn't critical anymore. But how can I get rid of it? Johan Swensson wrote: So I'm running nfs to get content to my web servers. Now I've had this problem 2 times (about 2 weeks since the last occurrence). I use drbd on the nfs server for redundancy. Now to my problem: All my web sites stopped responding so I started by checking dmesg and there I found a bunch of this errors || Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out But when checking the nfs server lockd was running and I could access all the files from the webserver with ls, cd etc. The logs on the nfs server doesn't say anything of interest and checking apaches error_log just says not found or unable to stat. Now I mentioned this have happened 2 times and both these times I've solved it by rebooting the nfs server and web servers. This isn't a good solution to have to reboot my servers every couple of weeks so I really could use some help. :) Also I get this from time to time on the web servers, dunno if it's related. /do_vfs_lock: VFS is out of sync with lock manager! / ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
On Tue, 2008-08-12 at 14:27 +0200, Johan Swensson wrote: So I'm running nfs to get content to my web servers. Now I've had this problem 2 times (about 2 weeks since the last occurrence). I use drbd on the nfs server for redundancy. Now to my problem: All my web sites stopped responding so I started by checking dmesg and there I found a bunch of this errors Aug 11 16:00:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out Aug 11 16:02:39 web03 kernel: lockd: server 192.168.20.22 not responding, timed out But when checking the nfs server lockd was running and I could access all the files from the webserver with ls, cd etc. The logs on the nfs server doesn't say anything of interest and checking apaches error_log just says not found or unable to stat. Now I mentioned this have happened 2 times and both these times I've solved it by rebooting the nfs server and web servers. This isn't a good solution to have to reboot my servers every couple of weeks so I really could use some help. :) Also I get this from time to time on the web servers, dunno if it's related. do_vfs_lock: VFS is out of sync with lock manager! I too have been having the same issues with my nfs server - which seems to have started when I updated on July 27th (5.2) It seems to happen after logrotate on Sunday morning but I didn't know about it until users show up on Monday mornings. /var/log/messages has... Aug 4 09:32:59 cube kernel: lockd: server HOSTNAME not responding, still trying and like you, I've rebooted the main server each time (Monday mornings)...there's something wrong that I can't figure out Craig ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
Johan Swensson wrote: It happend again this night but now I temporarily(?) fixed it with mounting -o nolock on the web servers. It works but dmesg is still spamming lockd: server 192.168.20.22 not responding, timed out. Atleast my sites are up, and the message isn't critical anymore. But how can I get rid of it? What does 'rpcinfo -p' read on both the servers and the clients? Also how about /etc/init.d/nfs status (both client and server) and /etc/init.d/nfslock status (both client and server) Any firewalls in between client and server? Run: iptables -L -n (on both client and server) nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
On Tue, 2008-08-12 at 20:16 -0700, nate wrote: Johan Swensson wrote: It happend again this night but now I temporarily(?) fixed it with mounting -o nolock on the web servers. It works but dmesg is still spamming lockd: server 192.168.20.22 not responding, timed out. Atleast my sites are up, and the message isn't critical anymore. But how can I get rid of it? What does 'rpcinfo -p' read on both the servers and the clients? Also how about /etc/init.d/nfs status (both client and server) and /etc/init.d/nfslock status (both client and server) Any firewalls in between client and server? Run: iptables -L -n (on both client and server) I don't want to step on Johan's thread but wanted to commiserate with him. No firewall's at present... nfs and nfslock on both client and server are running and show pid's client [EMAIL PROTECTED] ~]# rpcinfo -p program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 104 0111 portmapper 103 0111 portmapper 102 0111 portmapper 1000241 udp 50259 status 1000241 tcp 53710 status 1000211 tcp 53045 nlockmgr 1000213 tcp 53045 nlockmgr 1000214 tcp 53045 nlockmgr server [EMAIL PROTECTED] log]# rpcinfo -p program vers proto port 102 tcp111 portmapper 102 udp111 portmapper 1000241 udp 4003 status 1000241 tcp 4003 status 1000111 udp 4000 rquotad 1000112 udp 4000 rquotad 1000111 tcp 4000 rquotad 1000112 tcp 4000 rquotad 132 udp 2049 nfs 133 udp 2049 nfs 134 udp 2049 nfs 1000211 udp 4001 nlockmgr 1000213 udp 4001 nlockmgr 1000214 udp 4001 nlockmgr 1000211 tcp 4001 nlockmgr 1000213 tcp 4001 nlockmgr 1000214 tcp 4001 nlockmgr 132 tcp 2049 nfs 133 tcp 2049 nfs 134 tcp 2049 nfs 151 udp 4002 mountd 151 tcp 4002 mountd 152 udp 4002 mountd 152 tcp 4002 mountd 153 udp 4002 mountd 153 tcp 4002 mountd Server has ports fixed in place with settings in /etc/sysconfig/nfs Craig ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] NFS issues
No firewall on either end and server responds to ping. client: program vers proto port 102 tcp111 portmapper 102 udp111 portmapper 1000241 udp889 status 1000241 tcp892 status server: program vers proto port 102 tcp111 portmapper 102 udp111 portmapper 1000241 udp964 status 1000241 tcp967 status 1000111 udp718 rquotad 1000112 udp718 rquotad 1000111 tcp721 rquotad 1000112 tcp721 rquotad 132 udp 2049 nfs 133 udp 2049 nfs 134 udp 2049 nfs 1000211 udp 32768 nlockmgr 1000213 udp 32768 nlockmgr 1000214 udp 32768 nlockmgr 132 tcp 2049 nfs 133 tcp 2049 nfs 134 tcp 2049 nfs 1000211 tcp 58027 nlockmgr 1000213 tcp 58027 nlockmgr 1000214 tcp 58027 nlockmgr 151 udp778 mountd 151 tcp781 mountd 152 udp778 mountd 152 tcp781 mountd 153 udp778 mountd 153 tcp781 mountd However I just rebooted the nfs server. But when I checked before lockd was running with a ps -A As Craig said he started notice this about the the time he upgraded to 5.2, the same goes for me, started getting this problem about the time I've upgraded the clients and server. nate wrote: Johan Swensson wrote: It happend again this night but now I temporarily(?) fixed it with mounting -o nolock on the web servers. It works but dmesg is still spamming lockd: server 192.168.20.22 not responding, timed out. Atleast my sites are up, and the message isn't critical anymore. But how can I get rid of it? What does 'rpcinfo -p' read on both the servers and the clients? Also how about /etc/init.d/nfs status (both client and server) and /etc/init.d/nfslock status (both client and server) Any firewalls in between client and server? Run: iptables -L -n (on both client and server) nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos -- *Johan Swensson | apegroup* System Administrator [EMAIL PROTECTED] Mobile: +46 (0) 735 21 98 58 www.apegroup.com Fiskartorpsvägen 52, 115 42 Stockholm ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos