Re: [lustre-discuss] MGS failover problem

Vicker, Darby (JSC-EG311) Wed, 11 Jan 2017 09:31:13 -0800

> The question I have in this is how long are you waiting, and how are you
> determining that lnet has hung?


The example I just sent today I waited about 10 minutes.  But the other day it 
looks like I waited about 20 minutes before rebooting as I couldn't kill lnet.  
I'm calling it hung because even 10 minutes seems excessive.  Also because of 
the stack trace.  


> How are you specifying --failnode for your configuration?  If you could
> rune tunefs.lustre on the MDT/MGS and an OST, that would be very helpful.

We are not using --failnode, we are using --servicenode since the admin manual 
indicates --failnode has some disadvantages over --servicenode.  See the 
original post for how we formatted the MDT and OST's (not optimal).  

http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-January/014125.html

But service node options were corrected with a tunefs.lustre command – see this:

http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-January/014129.html


> Finally, how are you specifying the mount string on your various clients?

mount -t lustre 192.52.98.30@tcp:192.52.98.31@tcp:/hpfs-fsl /tmp/lustre_test/

But the clients seem to be just fine – its the OSS's that don't seem to be 
picking up the new MGS.  

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] MGS failover problem

Reply via email to