Hi Adam, thanks for your help. One problem was, that we did not mounted the GFS2 file system with no noatime and nodiratime options.
We still have a problem with postfix. The gfs2 hang analyzer says: There is 1 glock with waiters. node4, pid 20902 is waiting for glock 6/11486739, which is held by pid 12382 Both PIDs are on the some node: root 12382 0.0 0.0 36844 2300 ? Ss 12:39 0:00 /usr/lib/postfix/master root 20902 0.0 0.0 36844 2156 ? Ds 12:45 0:00 /usr/lib/postfix/master -t I have no idea what Postfix is trying to do here?! Mario Am 04.01.11 16:27, schrieb Adam Drew: > Hello, > > Processes accessing a GFS2 filesystem falling into D state is typically > indicative of lock contention; however, other causes are also possible. D > state is uninterruptable sleep waiting on IO. With regards to GFS2 this means > that a PID has requested access to some object on disk and has not yet gained > access to that object. As the PID cannot proceed until granted access it is > hung in D state. > > The most common cause of D state PIDs on GFS2 is lock contention. GFS2's > shared locking system is more complex than traditional single-node > filesystems. You can run into a situation where a given PID is locking a > resource but is waiting in line for a lock on another resource to be released > where the holder of that second resource is waiting on the PID holding the > first to release it as well. This causes a deadlock where neither process can > make process, both end up in D state, and so will any process that requests > access to either of those resources as well. In other cases PIDs requesting > access to a resource on disk may build up faster than than they release them. > In this case the queue of waiters will build and build until the filesystem > grinds to a halt and appears to "hang." In other cases bugs or design issues > may lead to locking bottlenecks. > > GFS2 locks are arbitrated in the glock (pronounced gee-lock) layer. The glock > subsystem is exposed via debugfs. You can mount debugfs, look in the gfs2 > directory, and view the glocks. You can then match up the glocks to the > process list on the system and to the messages logs. Doing this for every > node in the cluster can reveal problems. If you have Red Hat support I > encourage you to engage them as learning to read glocks can be non-trivial > process but it is not impossible. They are documented to a degree in the > following documents: > > "Testing and verification of cluster filesystems" by Steven Whitehouse > http://www.kernel.org/doc/ols/2009/ols2009-pages-311-318.pdf > > Global File System 2, Edition 7, section 1.4. "GFS2 Node Locking" > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Global_File_System_2/index.html#s1-ov-lockbounce > > More information is available out on the web. > > Regards, > Adam Drew > > ----- Original Message ----- > From: "Emilio Arjona" <emi...@ugr.es> > To: "linux clustering" <linux-cluster@redhat.com> > Sent: Tuesday, January 4, 2011 6:27:52 AM > Subject: Re: [Linux-cluster] Processes in D state > > > Same problem here, > > > in a webserver cluster httpd run into D state sometimes. I have to restart > the node or even the whole cluster if there are more than one node locked. > I'm using REDHAT 5.4 and HP hardware. > > > Regards, > > > 2011/1/4 Paras pradhan < pradhanpa...@gmail.com > > > > I had the same problem. it locked the whole gfs cluster and had to > reboot the node. after reboot all is fine now but still trying to find > out what has caused it. > > Paras > > On Monday, January 3, 2011, InterNetworX | Hostmaster > > > > < hostmas...@inwx.de > wrote: >> Hello, >> >> we are using GFS2 but sometimes there are processes hanging in D state: >> >> # ps axl | grep D >> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND >> 0 0 14220 14219 20 0 19624 1916 - Ds ? 0:00 >> /usr/lib/postfix/master -t >> 0 0 14555 14498 20 0 16608 1716 - D+ >> /mnt/storage/openvz/root/129/dev/pts/0 0:00 apt-get install less >> 0 0 15068 15067 19 -1 36844 2156 - D<s ? 0:00 >> /usr/lib/postfix/master -t >> 0 0 16603 16602 19 -1 36844 2156 - D<s ? 0:00 >> /usr/lib/postfix/master -t >> 4 101 19534 13238 19 -1 33132 2984 - D< ? 0:00 >> smtpd -n smtp -t inet -u -c >> 4 101 19542 13238 19 -1 33116 2976 - D< ? 0:00 >> smtpd -n smtp -t inet -u -c >> 0 0 19735 13068 20 0 7548 880 - S+ pts/0 0:00 grep D >> >> dmesg shows this message many times: >> >> [11142.334229] INFO: task master:14220 blocked for more than 120 seconds. >> [11142.334266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [11142.334310] master D ffff88032b644800 0 14220 14219 >> 0x00000000 >> [11142.334315] ffff88062dd40000 0000000000000086 0000000000000000 >> ffffffffa02628d9 >> [11142.334318] ffff88017a517ef8 000000000000fa40 ffff88017a517fd8 >> 0000000000016940 >> [11142.334322] 0000000000016940 ffff88032b644800 ffff88032b644af8 >> 0000000b7a517cd8 >> [11142.334325] Call Trace: >> [11142.334340] [<ffffffffa02628d9>] ? gfs2_glock_put+0xf9/0x118 [gfs2] >> [11142.334347] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd [gfs2] >> [11142.334353] [<ffffffffa0261db9>] ? gfs2_glock_holder_wait+0x9/0xd [gfs2] >> [11142.334358] [<ffffffff812e9897>] ? __wait_on_bit+0x41/0x70 >> [11142.334363] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd [gfs2] >> [11142.334367] [<ffffffff812e9931>] ? out_of_line_wait_on_bit+0x6b/0x77 >> [11142.334370] [<ffffffff81066808>] ? wake_bit_function+0x0/0x23 >> [11142.334376] [<ffffffffa0261d9e>] ? gfs2_glock_wait+0x23/0x28 [gfs2] >> [11142.334383] [<ffffffffa026b2b0>] ? gfs2_flock+0x17c/0x1f9 [gfs2] >> [11142.334386] [<ffffffff810e735d>] ? virt_to_head_page+0x9/0x2a >> [11142.334389] [<ffffffff810e743e>] ? ub_slab_ptr+0x22/0x65 >> [11142.334393] [<ffffffff8112221b>] ? sys_flock+0xff/0x12a >> [11142.334396] [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b >> >> Any idea what is going wrong? Do you need any more informations? >> >> Mario >> >> -- >> Linux-cluster mailing list >> Linux-cluster@redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > -- Linux-cluster mailing list Linux-cluster@redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster