[ceph-users] Host crash undetected by ceph health check

Frank Schilder Fri, 07 May 2021 17:07:15 -0700

Dear cephers,

today it seems I observed an impossible event for the first time: an OSD host 
crashed, but the ceph health monitoring did not recognise the crash. Not a 
single OSD was marked down and IO simply stopped, waiting for the crashed OSDs 
to respond. All that was reported was slow ops, slow meta data IO, MDS behind 
on trimming, but no OSD fail. I have rebooted these machines a lot of times and 
have never seen the health check fail to recognise that instantly. The only 
difference I see is that these were clean shut-downs, not crashes (I believe 
the OSDs mark themselves as down).


For debugging this problem, can anyone provide me with a pointer when this 
could be the result of a misconfiguration?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Host crash undetected by ceph health check

Reply via email to