> The question I have in this is how long are you waiting, and how are you > determining that lnet has hung?
The example I just sent today I waited about 10 minutes. But the other day it looks like I waited about 20 minutes before rebooting as I couldn't kill lnet. I'm calling it hung because even 10 minutes seems excessive. Also because of the stack trace. > How are you specifying --failnode for your configuration? If you could > rune tunefs.lustre on the MDT/MGS and an OST, that would be very helpful. We are not using --failnode, we are using --servicenode since the admin manual indicates --failnode has some disadvantages over --servicenode. See the original post for how we formatted the MDT and OST's (not optimal). http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-January/014125.html But service node options were corrected with a tunefs.lustre command – see this: http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-January/014129.html > Finally, how are you specifying the mount string on your various clients? mount -t lustre 192.52.98.30@tcp:192.52.98.31@tcp:/hpfs-fsl /tmp/lustre_test/ But the clients seem to be just fine – its the OSS's that don't seem to be picking up the new MGS. _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
