[ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

shadow_lin Sat, 03 Mar 2018 23:29:08 -0800

Hi list,
During my test of ceph,I find sometime the whole ceph cluster are blocked and 
the reason was one unfunctional osd.Ceph can heal itself if some osd is down, 
but it seems if some osd is half dead (have heart beat but can't handle 
request) then all the request which are directed to that osd would be blocked. 
If all osds are in one pool and the whole cluster would be blocked due to that 
one hanged osd.
I think this is because ceph will try to distribute the request to all osds and 
if one of the osd wont confirm the request is done then everything is blocked.
Is there a way to let ceph to mark the the crippled osd down if the requests 
direct to that osd are blocked more than certain time to avoid the whole 
cluster is blocked?


2018-03-04


shadow_lin

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Why one crippled osd can slow down or block all request to the whole ceph cluster?

Reply via email to