Hello, When a RH5.7 nfs data server is under heavy IO load (IO intensive HPC jobs), sometimes the server stops responding. Still responds to icmp pings, but no console login possible so almost dead. I captured the console message, does anybody has an idea if it's possible to tune kernel or xfs to prevent this from happening?
Red Hat Enterprise Linux Server release 5.7 (Tikanga) Kernel 2.6.18-274.7.1.el5 on an x86_64 server login: INFO: task xfsdatad/2:3426 blocked for more than 120 seconds.^M "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M xfsdatad/2 D ffffffff80154db9 0 3426 71 3427 3425 (L-TLB)^M ffff81011b1f1dc0 0000000000000046 0000000000000000 0000000000000000^M 0000000000000100 000000000000000a ffff81011d0d77a0 ffff81011ff24080^M 000000f44d72caa3 000000000000071c ffff81011d0d7988 0000000200000000^M Call Trace:^M [<ffffffff885d1d16>] :xfs:xfs_end_bio_delalloc+0x0/0x12^M [<ffffffff800645e3>] __down_write_nested+0x7a/0x92^M [<ffffffff885d1ca4>] :xfs:xfs_setfilesize+0x2d/0x8d^M [<ffffffff885d1d1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12^M [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb^M [<ffffffff80049b3d>] worker_thread+0x0/0x122^M [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4^M [<ffffffff80049c2d>] worker_thread+0xf0/0x122^M [<ffffffff8008e87f>] default_wake_function+0x0/0xe^M [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4^M [<ffffffff8003270f>] kthread+0xfe/0x132^M [<ffffffff8005dfb1>] child_rip+0xa/0x11^M [<ffffffff800a2c39>] keventd_create_kthread+0x0/0xc4^M [<ffffffff80032611>] kthread+0x0/0x132^M [<ffffffff8005dfa7>] child_rip+0x0/0x11^M Kernel 2.6.18-274.12.1.el5 on an x86_64 server login: INFO: task xfsdatad/2:3421 blocked for more than 120 seconds.^M "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.^M xfsdatad/2 D ffffffff80154dc2 0 3421 71 3422 3420 (L-TLB)^M ffff81011a47fdc0 0000000000000046 0000000000000000 0000000000000000^M 0000000000000100 000000000000000a ffff810118f1c820 ffff81011ff24080^M 00000040fc945f9d 000000000000056f ffff810118f1ca08 0000000200000000^M Call Trace:^M [<ffffffff885dcd16>] :xfs:xfs_end_bio_delalloc+0x0/0x12^M [<ffffffff800645e3>] __down_write_nested+0x7a/0x92^M [<ffffffff885dcca4>] :xfs:xfs_setfilesize+0x2d/0x8d^M [<ffffffff885dcd1f>] :xfs:xfs_end_bio_delalloc+0x9/0x12^M [<ffffffff8004d32e>] run_workqueue+0x9e/0xfb^M [<ffffffff80049b3d>] worker_thread+0x0/0x122^M [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4^M [<ffffffff80049c2d>] worker_thread+0xf0/0x122^M [<ffffffff8008e880>] default_wake_function+0x0/0xe^M [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4^M [<ffffffff8003270f>] kthread+0xfe/0x132^M [<ffffffff8005dfb1>] child_rip+0xa/0x11^M [<ffffffff800a2c3a>] keventd_create_kthread+0x0/0xc4^M [<ffffffff80032611>] kthread+0x0/0x132^M [<ffffffff8005dfa7>] child_rip+0x0/0x11^M Cheers, Andre Wageningen University The Netherlands _______________________________________________ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list