Sorry, I have to correct this:  "the nodes CANNOT mount the storage and I can't 
access the Lustre server machine neither".


On Wednesday ۱۷ July ۱۳۹۲ at ۱۱:۲۱, Arya Mazaheri wrote:

> Hi everyone,  
> I have a problem lately with our Lustre 1.8 deployment. It crashes 
> periodically in a way that the nodes can mount the storage and I can't access 
> the Lustre server machine neither. So I have to manually restart the machine 
> every time to make everything normal again. I tried to see the logs, memory 
> usage and locks count to see whether these issues may have the cause of the 
> problem. But, I don't think they account for this issue.
> An interesting symptom I see every time this problem happens is the 
> Infiniband switch network usage lights which blink very fast. I think a huge 
> traffic on the Infiniband network to the lustre server may cause the server 
> crash. Does this relevance seems logical?
>  
> Anyway, I hope some of you may have experience this problem before and could 
> help me understand what is happening and how to avoid crashing the server 
> again!
>  
> Thanks,  

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to