Hello,

When a RH5.7 nfs data server is under heavy IO load (IO intensive HPC
jobs), sometimes the server stops responding. Still responds to icmp
pings, but no console login possible so almost dead. I captured the
console message, does anybody has an idea if it's possible to tune kernel
or xfs to prevent this from happening?

Red Hat Enterprise Linux Server release 5.7 (Tikanga)
Kernel 2.6.18-274.7.1.el5 on an x86_64
server login: INFO: task xfsdatad/2:3426 blocked for more than 120
seconds.^M
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
xfsdatad/2    D ffffffff80154db9     0  3426     71          3427  3425
(L-TLB)^M
 ffff81011b1f1dc0 0000000000000046 0000000000000000 0000000000000000^M
 0000000000000100 000000000000000a ffff81011d0d77a0 ffff81011ff24080^M
 000000f44d72caa3 000000000000071c ffff81011d0d7988 0000000200000000^M
Call Trace:^M
 [<ffffffff885d1d16>] :xfs:xfs_end_bio_delalloc+0x0/0x12^M
 [<ffffffff800645e3>] __down_write_nested+0x7a/0x92^M
 [<ffffffff885d1ca4>] :xfs:xfs_setfilesize+0x2d/0x8d^M
 [<ffffffff885d1d1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12^M
 [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb^M
 [<ffffffff80049b3d>] worker_thread+0x0/0x122^M
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4^M
 [<ffffffff80049c2d>] worker_thread+0xf0/0x122^M
 [<ffffffff8008e87f>] default_wake_function+0x0/0xe^M
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4^M
 [<ffffffff8003270f>] kthread+0xfe/0x132^M
 [<ffffffff8005dfb1>] child_rip+0xa/0x11^M
 [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4^M
 [<ffffffff80032611>] kthread+0x0/0x132^M
 [<ffffffff8005dfa7>] child_rip+0x0/0x11^M




Kernel 2.6.18-274.12.1.el5 on an x86_64
server login: INFO: task xfsdatad/2:3421 blocked for more than 120
seconds.^M
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M
xfsdatad/2    D ffffffff80154dc2     0  3421     71          3422  3420
(L-TLB)^M
 ffff81011a47fdc0 0000000000000046 0000000000000000 0000000000000000^M
 0000000000000100 000000000000000a ffff810118f1c820 ffff81011ff24080^M
 00000040fc945f9d 000000000000056f ffff810118f1ca08 0000000200000000^M
Call Trace:^M
 [<ffffffff885dcd16>] :xfs:xfs_end_bio_delalloc+0x0/0x12^M
 [<ffffffff800645e3>] __down_write_nested+0x7a/0x92^M
 [<ffffffff885dcca4>] :xfs:xfs_setfilesize+0x2d/0x8d^M
 [<ffffffff885dcd1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12^M
 [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb^M
 [<ffffffff80049b3d>] worker_thread+0x0/0x122^M
 [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4^M
 [<ffffffff80049c2d>] worker_thread+0xf0/0x122^M
 [<ffffffff8008e880>] default_wake_function+0x0/0xe^M
 [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4^M
 [<ffffffff8003270f>] kthread+0xfe/0x132^M
 [<ffffffff8005dfb1>] child_rip+0xa/0x11^M
 [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4^M
 [<ffffffff80032611>] kthread+0x0/0x132^M
 [<ffffffff8005dfa7>] child_rip+0x0/0x11^M





Cheers,
Andre
Wageningen University
The Netherlands



_______________________________________________
rhelv5-list mailing list
rhelv5-list@redhat.com
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to