We've been seeing occasional hangs on our MDS and I'd like to see if 
anyone else is seeing this or can provide suggestions on where to look.
This might not even be a Lustre problem at all.

We're running Lustre 1.8.4 with OFED 1.5.2, and kernel version 
2.6.18-194.3.1.el5_lustre.1.8.4.

The problem is that at some point it appears that something in the IB 
stack is going out to lunch- pings to the IPoIB interface time out, and 
anything that touches IB (perfquery, etc) goes into a hard hang and cannot 
be killed.

The only solution to the problem once it occurs is to power-cycle the 
machine, as shutdown/reboot hang as well.

>From what I can see, the first abnormal entries in the system logs on 
the MDS are messages showing that connections to the OSSes are timing out.

Any insight would be appreciated.

Thanks,

Kevin
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to