Hi All,
We have 6 nodes running GFS2 under CentOS 5.3 all connecting via Cisco 2960G
switches to an MD3000i with 8 x 146GB SAS 15K drives. These nodes run a PHP
website pulling their PHP and images files from a GFS2 volume being exported by
iSCSI from the MD3000i .
Problem we have is that since inception we've seen issues whereby the HTTPD
processes will go into a state of 'D', zombied' and the only way we have to
recover from that is to restart all the nodes in the cluster.
I've tuned the demote_secs down from 300 to 20 seconds on the assumption that
file locking is causing an issue. Similarly we're running with the following
GFS values;
<gfs_controld plock_ownership="1" plock_rate_limit="0"/>
Can anyone give me some pointers on what we should be investigating for why
this is failing? I've had our networks team crawl over the networking and that
all seems fine. The MTU is set correctly on the MD3000i and on the individual
nodes. I've also used the ping_pong tool and on a single file on the GFS
cluster we can get around 90K locks on a file. If I run ping_pong against the
same file from two nodes that then drops to around 70 locks per second. I don't
think that's the issue though.
If anyone can provide some insight to either what to change, what to debug or
how to investigate this further it'd be greatly appreciated.
Thanks
Gavin
Gavin Conway
Senior Engineer, Operations (Systems Group), UKSolutions
Telephone: 0845 004 1333, option 2
Email: [email protected]
Web: www.uksolutions.co.uk<http://www.uksolutions.co.uk/>
UKS Ltd, Birmingham Road, Studley, Warwickshire, B80 7BG Registered in England
Number 3036806
This email must be read in conjunction with the legal & service notices on
http://www.uksolutions.co.uk/disclaimer.html
--
Linux-cluster mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/linux-cluster