Re: [Linux-cluster] Processes in D state

InterNetworX | Hostmaster Wed, 05 Jan 2011 05:07:48 -0800

Hi Adam,

thanks for your help. One problem was, that we did not mounted the GFS2
file system with no noatime and nodiratime options.


We still have a problem with postfix. The gfs2 hang analyzer says:

There is 1 glock with waiters.
node4, pid 20902 is waiting for glock 6/11486739, which is held by pid 12382

Both PIDs are on the some node:

root     12382  0.0  0.0  36844  2300 ?        Ss   12:39   0:00
/usr/lib/postfix/master
root     20902  0.0  0.0  36844  2156 ?        Ds   12:45   0:00
/usr/lib/postfix/master -t

I have no idea what Postfix is trying to do here?!

Mario

Am 04.01.11 16:27, schrieb Adam Drew:
> Hello,
> 
> Processes accessing a GFS2 filesystem falling into D state is typically 
> indicative of lock contention; however, other causes are also possible. D 
> state is uninterruptable sleep waiting on IO. With regards to GFS2 this means 
> that a PID has requested access to some object on disk and has not yet gained 
> access to that object. As the PID cannot proceed until granted access it is 
> hung in D state.
> 
> The most common cause of D state PIDs on GFS2 is lock contention. GFS2's 
> shared locking system is more complex than traditional single-node 
> filesystems. You can run into a situation where a given PID is locking a 
> resource but is waiting in line for a lock on another resource to be released 
> where the holder of that second resource is waiting on the PID holding the 
> first to release it as well. This causes a deadlock where neither process can 
> make process, both end up in D state, and so will any process that requests 
> access to either of those resources as well. In other cases PIDs requesting 
> access to a resource on disk may build up faster than than they release them. 
> In this case the queue of waiters will build and build until the filesystem 
> grinds to a halt and appears to "hang." In other cases bugs or design issues 
> may lead to locking bottlenecks.
> 
> GFS2 locks are arbitrated in the glock (pronounced gee-lock) layer. The glock 
> subsystem is exposed via debugfs. You can mount debugfs, look in the gfs2 
> directory, and view the glocks. You can then match up the glocks to the 
> process list on the system and to the messages logs. Doing this for every 
> node in the cluster can reveal problems. If you have Red Hat support I 
> encourage you to engage them as learning to read glocks can be non-trivial 
> process but it is not impossible. They are documented to a degree in the 
> following documents:
> 
> "Testing and verification of cluster filesystems" by Steven Whitehouse
> http://www.kernel.org/doc/ols/2009/ols2009-pages-311-318.pdf
> 
> Global File System 2, Edition 7, section 1.4. "GFS2 Node Locking"
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html-single/Global_File_System_2/index.html#s1-ov-lockbounce
> 
> More information is available out on the web.
> 
> Regards,
> Adam Drew
> 
> ----- Original Message -----
> From: "Emilio Arjona" <emi...@ugr.es>
> To: "linux clustering" <linux-cluster@redhat.com>
> Sent: Tuesday, January 4, 2011 6:27:52 AM
> Subject: Re: [Linux-cluster] Processes in D state
> 
> 
> Same problem here, 
> 
> 
> in a webserver cluster httpd run into D state sometimes. I have to restart 
> the node or even the whole cluster if there are more than one node locked. 
> I'm using REDHAT 5.4 and HP hardware. 
> 
> 
> Regards, 
> 
> 
> 2011/1/4 Paras pradhan < pradhanpa...@gmail.com > 
> 
> 
> I had the same problem. it locked the whole gfs cluster and had to 
> reboot the node. after reboot all is fine now but still trying to find 
> out what has caused it. 
> 
> Paras 
> 
> On Monday, January 3, 2011, InterNetworX | Hostmaster 
> 
> 
> 
> < hostmas...@inwx.de > wrote: 
>> Hello, 
>>
>> we are using GFS2 but sometimes there are processes hanging in D state: 
>>
>> # ps axl | grep D 
>> F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 
>> 0 0 14220 14219 20 0 19624 1916 - Ds ? 0:00 
>> /usr/lib/postfix/master -t 
>> 0 0 14555 14498 20 0 16608 1716 - D+ 
>> /mnt/storage/openvz/root/129/dev/pts/0 0:00 apt-get install less 
>> 0 0 15068 15067 19 -1 36844 2156 - D<s ? 0:00 
>> /usr/lib/postfix/master -t 
>> 0 0 16603 16602 19 -1 36844 2156 - D<s ? 0:00 
>> /usr/lib/postfix/master -t 
>> 4 101 19534 13238 19 -1 33132 2984 - D< ? 0:00 
>> smtpd -n smtp -t inet -u -c 
>> 4 101 19542 13238 19 -1 33116 2976 - D< ? 0:00 
>> smtpd -n smtp -t inet -u -c 
>> 0 0 19735 13068 20 0 7548 880 - S+ pts/0 0:00 grep D 
>>
>> dmesg shows this message many times: 
>>
>> [11142.334229] INFO: task master:14220 blocked for more than 120 seconds. 
>> [11142.334266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
>> disables this message. 
>> [11142.334310] master D ffff88032b644800 0 14220 14219 
>> 0x00000000 
>> [11142.334315] ffff88062dd40000 0000000000000086 0000000000000000 
>> ffffffffa02628d9 
>> [11142.334318] ffff88017a517ef8 000000000000fa40 ffff88017a517fd8 
>> 0000000000016940 
>> [11142.334322] 0000000000016940 ffff88032b644800 ffff88032b644af8 
>> 0000000b7a517cd8 
>> [11142.334325] Call Trace: 
>> [11142.334340] [<ffffffffa02628d9>] ? gfs2_glock_put+0xf9/0x118 [gfs2] 
>> [11142.334347] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd [gfs2] 
>> [11142.334353] [<ffffffffa0261db9>] ? gfs2_glock_holder_wait+0x9/0xd [gfs2] 
>> [11142.334358] [<ffffffff812e9897>] ? __wait_on_bit+0x41/0x70 
>> [11142.334363] [<ffffffffa0261db0>] ? gfs2_glock_holder_wait+0x0/0xd [gfs2] 
>> [11142.334367] [<ffffffff812e9931>] ? out_of_line_wait_on_bit+0x6b/0x77 
>> [11142.334370] [<ffffffff81066808>] ? wake_bit_function+0x0/0x23 
>> [11142.334376] [<ffffffffa0261d9e>] ? gfs2_glock_wait+0x23/0x28 [gfs2] 
>> [11142.334383] [<ffffffffa026b2b0>] ? gfs2_flock+0x17c/0x1f9 [gfs2] 
>> [11142.334386] [<ffffffff810e735d>] ? virt_to_head_page+0x9/0x2a 
>> [11142.334389] [<ffffffff810e743e>] ? ub_slab_ptr+0x22/0x65 
>> [11142.334393] [<ffffffff8112221b>] ? sys_flock+0xff/0x12a 
>> [11142.334396] [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b 
>>
>> Any idea what is going wrong? Do you need any more informations? 
>>
>> Mario 
>>
>> -- 
>> Linux-cluster mailing list 
>> Linux-cluster@redhat.com 
>> https://www.redhat.com/mailman/listinfo/linux-cluster 
>>
> 

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] Processes in D state

Reply via email to