On Tue, 2009-10-06 at 12:48 +0200, Michael Schwartzkopff wrote: > Hi, Hi,
> my system load shows that quite a number of processes are waiting. Blocked. I guess the word waiting is similar. > My questions are: > What causes the problem? In this case, the thread has lbugged previously. If you look in syslog for node with these processes you should find entries with LBUG and/or ASSERTION messages. These are the defects that are causing the processes to get blocked (uninteruptable sleep) > Can I kill the "hanging" processes? Nope. You have to reboot the node. Please search bugzilla for the LBUG/ASSERTIONs you are getting and if you don't find anything that matches, please file a new bug. > Oct 5 10:28:03 sosmds2 kernel: Lustre: 0:0:(watchdog.c:181:lcw_cb()) > Watchdog > triggered for pid 28402: it was inactive for 200.00s > Oct 5 10:28:03 sosmds2 kernel: ll_mdt_35 D ffff81000100c980 0 28402 > > 1 28403 28388 (L-TLB) > Oct 5 10:28:03 sosmds2 kernel: ffff81041c723810 0000000000000046 > 0000000000000000 7fffffffffffffff > Oct 5 10:28:03 sosmds2 kernel: ffff81041c7237d0 0000000000000001 > ffff81022f3e60c0 ffff81022f12e080 > Oct 5 10:28:03 sosmds2 kernel: 000177b2feff847c 00000000000014df > ffff81022f3e62a8 000000010000028f > Oct 5 10:28:03 sosmds2 kernel: Call Trace: > Oct 5 10:28:03 sosmds2 kernel: [<ffffffff8008a3ef>] > default_wake_function+0x0/0xe > Oct 5 10:28:03 sosmds2 kernel: [<ffffffff885b1b26>] > :libcfs:lbug_with_loc+0xc6/0xd0 Here's where you can see that the thread has lbugged. b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss