On Jul 26, 2019, at 04:28, Thomas Roth <t.r...@gsi.de<mailto:t.r...@gsi.de>> wrote:
Hi all, this morning one of our MDT went 'unhealthy', Jul 26 10:15:13 lxmds20 kernel: LustreError: 9510:0:(service.c:3285:ptlrpc_svcpt_health_check()) mdt: unhealthy - request has been waiting 1017s However, somewhat later, lxmds20:~# cat /sys/fs/lustre/health_check healthy and all Lustre operations seem to be good, too. This means that some RPC has been stuck, but if the RPC eventually completes then there is no reason for the MDS to be "unhealthy" anymore. Cheers, Andreas -- Andreas Dilger Principal Lustre Architect Whamcloud
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org