I did a test which is simply umount the MDT and check the behaviour of the clients. During the test all clients are blocked until a few seconds after re-mounting the MDT is re-mounted. We want the system to respond fast, reporting an error is acceptable, but blocking for 10 seconds isn't. Whenever a server node, a block device or a network connection fails, could lustre just report an error instead of try to recover ?
Thanks Yao
_______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss