We've been seeing occasional hangs on our MDS and I'd like to see if anyone else is seeing this or can provide suggestions on where to look. This might not even be a Lustre problem at all.
We're running Lustre 1.8.4 with OFED 1.5.2, and kernel version 2.6.18-194.3.1.el5_lustre.1.8.4. The problem is that at some point it appears that something in the IB stack is going out to lunch- pings to the IPoIB interface time out, and anything that touches IB (perfquery, etc) goes into a hard hang and cannot be killed. The only solution to the problem once it occurs is to power-cycle the machine, as shutdown/reboot hang as well. >From what I can see, the first abnormal entries in the system logs on the MDS are messages showing that connections to the OSSes are timing out. Any insight would be appreciated. Thanks, Kevin _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss