Hello cephers,

I know that there was similar question posted 5 years ago.  However the answer 
was inconclusive for me.
I installed a new Nautilus 14.2.1 cluster and started pre-production testing.  
I followed RedHat document and simulated a soft disk failure by

#  echo 1 > /sys/block/sdc/device/delete

The cluster has been idle at the moment being new and all.  I noticed some disk 
related errors in dmesg but that was about it.
It looked to me for the next 20 - 30 minutes the failure has not been detected. 
 All osds were up and in and health was OK. OSD logs had no smoking gun either.
After 30 minutes, I restarted the OSD container and it failed to start as 
expected.

Later on, I performed the same operation during the fio bench mark and OSD 
failed immediately.

My question is:  Should the disk problem have been detected quick enough even 
on the idle cluster? I thought Nautilus has the means to sense failure before 
intensive IO hit the disk.
Am I wrong to expect that?


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to