Sorry, I have to correct this: "the nodes CANNOT mount the storage and I can't access the Lustre server machine neither".
On Wednesday ۱۷ July ۱۳۹۲ at ۱۱:۲۱, Arya Mazaheri wrote: > Hi everyone, > I have a problem lately with our Lustre 1.8 deployment. It crashes > periodically in a way that the nodes can mount the storage and I can't access > the Lustre server machine neither. So I have to manually restart the machine > every time to make everything normal again. I tried to see the logs, memory > usage and locks count to see whether these issues may have the cause of the > problem. But, I don't think they account for this issue. > An interesting symptom I see every time this problem happens is the > Infiniband switch network usage lights which blink very fast. I think a huge > traffic on the Infiniband network to the lustre server may cause the server > crash. Does this relevance seems logical? > > Anyway, I hope some of you may have experience this problem before and could > help me understand what is happening and how to avoid crashing the server > again! > > Thanks,
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss