Thanks Ricardo, We don't want to update the server because it's in production. We will plan a system update in summer when system's load is low.
In the last incidents there is a new process involved: [delete_workqueu]. Now, it is usually the initiator of the D-state processes lockout. I have been looking for information about this process but couldn't find out anything. Any idea? Regards :) 2010/4/9 Ricardo Argüello <[email protected]> > Looks like this bug: > > GFS2 - probably lost glock call back > https://bugzilla.redhat.com/show_bug.cgi?id=498976 > > This is fixed in the kernel included in RHEL 5.5. > Do a "yum update" to fix it. > > Ricardo Arguello > > On Tue, Mar 2, 2010 at 6:10 AM, Emilio Arjona <[email protected]> wrote: > > Thanks for your response, Steve. > > > > 2010/3/2 Steven Whitehouse <[email protected]>: > >> Hi, > >> > >> On Fri, 2010-02-26 at 16:52 +0100, Emilio Arjona wrote: > >>> Hi, > >>> > >>> we are experiencing some problems commented in an old thread: > >>> > >>> http://www.mail-archive.com/[email protected]/msg07091.html > >>> > >>> We have 3 clustered servers under Red Hat 5.4 accessing a GFS2 > resource. > >>> > >>> fstab options: > >>> /dev/vg_cluster/lv_cluster /opt/datacluster gfs2 > >>> defaults,noatime,nodiratime,noquota 0 0 > >>> > >>> GFS options: > >>> plock_rate_limit="0" > >>> plock_ownership=1 > >>> > >>> httpd processes run into D status sometimes and the only solution is > >>> hard reset the affected server. > >>> > >>> Can anyone give me some hints to diagnose the problem? > >>> > >>> Thanks :) > >>> > >> Can you give me a rough idea of what the actual workload is and how it > >> is distributed amoung the director(y/ies) ? > > > > We had problems with php sessions in the past but we fixed it by > > configuring php to store the sessions in the database instead of in > > the GFS filesystem. Now, we're having problems with files and > > directories in the "data" folder of Moodle LMS. > > > > "lsof -p" returned a i/o operation over the same folder in 2/3 nodes, > > we did a hard reset of these nodes but some hours after the CPU load > > grew up again, specially in the node that wasn't rebooted. We decided > > to reboot (vía ssh) this node, then the CPU load went down to normal > > values in all nodes. > > > > I don't think the system's load is high enough to produce concurrent > > access problems. It's more likely to be some misconfiguration, in > > fact, we changed some GFS2 options to non default values to increase > > performance ( > http://www.linuxdynasty.org/howto-increase-gfs2-performance-in-a-cluster.html > ). > > > >> > >> This is often down to contention on glocks (one per inode) and maybe > >> because there is a process of processes writing a file or directory > >> which is in use (either read-only or writable) by other processes. > >> > >> If you are using php, then you might have to strace it to find out what > >> it is really doing, > > > > Ok, we will try to strace the D processes and post the results. Hope > > we find something!! > > > >> > >> Steve. > >> > >>> -- > >>> > >>> Emilio Arjona. > >>> > >>> -- > >>> Linux-cluster mailing list > >>> [email protected] > >>> https://www.redhat.com/mailman/listinfo/linux-cluster > >> > >> > >> -- > >> Linux-cluster mailing list > >> [email protected] > >> https://www.redhat.com/mailman/listinfo/linux-cluster > >> > > > > > > > > -- > > Emilio Arjona. > > > > -- > > Linux-cluster mailing list > > [email protected] > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/linux-cluster > -- ******************************************* Emilio Arjona Heredia Centro de Enseñanzas Virtuales de la Universidad de Granada C/ Real de Cartuja 36-38 http://cevug.ugr.es Tlfno.: 958-241000 ext. 20206 *******************************************
-- Linux-cluster mailing list [email protected] https://www.redhat.com/mailman/listinfo/linux-cluster
