[Lustre-discuss] Lustre crashes periodically
Hi everyone, I have a problem lately with our Lustre 1.8 deployment. It crashes periodically in a way that the nodes can mount the storage and I can't access the Lustre server machine neither. So I have to manually restart the machine every time to make everything normal again. I tried to see the logs, memory usage and locks count to see whether these issues may have the cause of the problem. But, I don't think they account for this issue. An interesting symptom I see every time this problem happens is the Infiniband switch network usage lights which blink very fast. I think a huge traffic on the Infiniband network to the lustre server may cause the server crash. Does this relevance seems logical? Anyway, I hope some of you may have experience this problem before and could help me understand what is happening and how to avoid crashing the server again! Thanks,___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre crashes periodically
Sorry, I have to correct this: the nodes CANNOT mount the storage and I can't access the Lustre server machine neither. On Wednesday ۱۷ July ۱۳۹۲ at ۱۱:۲۱, Arya Mazaheri wrote: Hi everyone, I have a problem lately with our Lustre 1.8 deployment. It crashes periodically in a way that the nodes can mount the storage and I can't access the Lustre server machine neither. So I have to manually restart the machine every time to make everything normal again. I tried to see the logs, memory usage and locks count to see whether these issues may have the cause of the problem. But, I don't think they account for this issue. An interesting symptom I see every time this problem happens is the Infiniband switch network usage lights which blink very fast. I think a huge traffic on the Infiniband network to the lustre server may cause the server crash. Does this relevance seems logical? Anyway, I hope some of you may have experience this problem before and could help me understand what is happening and how to avoid crashing the server again! Thanks, ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Lustre crashes periodically
Did you run lfsck against it? No kernel crash dumps? Maybe it’s not Lustre related problem? If you have no Active/Passive MDS setup, Lustre file system will be unusable if the MDS server crashes for whatever reason. Abraham Alawi Linux/UNIX Systems and Storage Specialist | STACC Project | Information Management Technology (IMT) | CSIRO From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Arya Mazaheri Sent: Wednesday, 9 October 2013 6:52 PM To: lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Lustre crashes periodically Hi everyone, I have a problem lately with our Lustre 1.8 deployment. It crashes periodically in a way that the nodes can mount the storage and I can't access the Lustre server machine neither. So I have to manually restart the machine every time to make everything normal again. I tried to see the logs, memory usage and locks count to see whether these issues may have the cause of the problem. But, I don't think they account for this issue. An interesting symptom I see every time this problem happens is the Infiniband switch network usage lights which blink very fast. I think a huge traffic on the Infiniband network to the lustre server may cause the server crash. Does this relevance seems logical? Anyway, I hope some of you may have experience this problem before and could help me understand what is happening and how to avoid crashing the server again! Thanks, ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss